AN  INVESTIGATION  OF  THE  EFFECTS  OF 
CORREEATION,  AUTOCORREEATION,  AND 
SAMPEE  SIZE  IN  CEASSIFIER  FUSION 

THESIS 


Nathan  J.  Eeap,  Captain,  USAF 


AFIT/GOR/ENS/04-06 


DEPARTMENT  OF  THE  AIR  FORCE 
AIR  UNIVERSITY 

AIR  FORCE  INSTITUTE  OF  TECHNOLOGY 


Wright-Patterson  Air  Force  Base,  Ohio 


APPROVED  FOR  PUBEIC  REEEASE;  DISTRIBUTION  UNLIMITED. 


The  views  expressed  in  this  thesis  are  those  of  the  author  and  do  not  refleet  the 
offieial  policy  or  position  of  the  United  States  Air  Force,  Department  of  Defense,  or  the 
United  States  Government. 


AFIT/GOR/ENS/04-06 


AN  INVESTIGATION  OF  THE  EFFECTS  OF  CORRELATION, 
AUTOCORRELATION,  AND  SAMPLE  SIZE  IN  CLASSIFIER  FUSION 

THESIS 


Presented  to  the  Eaeulty 
Department  of  Operational  Seienees 
Graduate  Sehool  of  Engineering  and  Management 
Air  Eoree  Institute  of  Teehnology 
Air  University 

Air  Edueation  and  Training  Command 
In  Partial  Eulfillment  of  the  Requirements  for  the 
Degree  of  Master  of  Seienee  in  Engineering  and  Environmental  Management 


Nathan  J.  Eeap,  BS 
Captain,  USAE 

Mareh  2004 


APPROVED  EOR  PUBEIC  REEEASE;  DISTRIBUTION  UNLIMITED. 


AFIT/GOR/ENS/04-06 


AN  INVESTIGATION  OF  THE  EFFECTS  OF  CORRELATION, 
AUTOCORRELATION,  AND  SAMPLE  SIZE  IN  CLASSIFIER  FUSION 


Nathan  J.  Leap,  BS 
Captain,  USAF 


Approved; 


//signed// 


03  Mar  04 


Kenneth  W.  Bauer  (Chairman) 


date 


//signed// 


03  Mar  04 


Mark  E.  Oxley  (Member) 


date 


AFIT/GOR/ENS/04-06 


Abstract 

This  thesis  extends  the  researeh  found  in  Storm,  Bauer,  and  Oxley,  2003.  Data 
eorrelation  effeets  and  sample  size  effeets  on  three  elassifier  fusion  teehniques  and  one 
data  fusion  teehnique  were  investigated.  Identifieation  System  Operating  Charaeteristie 
Fusion  (Haspert,  2000),  the  Reeeiver  Operating  Charaeteristie  “Within”  Fusion  method 
(Oxley  and  Bauer,  2002),  and  a  Probabilistie  Neural  Network  were  the  three  elassifier 
fusion  teehniques;  a  Generalized  Regression  Neural  Network  was  the  data  fusion 
teehnique.  Correlation  was  injeeted  into  the  data  set  both  within  a  feature  set 
(autoeorrelation)  and  aeross  feature  sets  for  a  variety  of  elassifieation  problems,  and 
sample  size  was  varied  throughout.  Total  Probability  of  Miselassifieation  (TPM)  was 
ealeulated  for  some  problems  to  show  the  effeet  of  eorrelation  on  TPM.  Feature  seleetion 
was  performed  in  some  experiments  to  show  the  effeets  of  seleeting  only  eertain  features. 
Finally,  experiments  were  designed  and  analyzed  using  analysis  of  varianee  to  identify 
what  faetors  had  the  most  signifieant  impaet  on  fusion  algorithm  performanee. 


IV 


Acknowledgments 


I  would  like  to  express  my  sineere  thanks  and  appreeiation  to  my  faeulty  advisor, 
Dr  Kenneth  Bauer,  for  his  support  and  guidanee  throughout  the  thesis  proeess.  His 
expertise  and  insights  were  invaluable,  and  he  made  the  thesis  effort  an  enjoyable 
experienee.  His  ability  to  develop  ingenious  solutions  to  diffieult  problems  taught  me 
valuable  lessons  that  not  only  helped  me  through  the  thesis  effort  but  that  I  will  use 
throughout  my  eareer.  I  would  also  like  to  thank  my  reader.  Dr  Mark  Oxley,  for  his 
assistanee  with  MatLab  eode  and  mathematieal  notation. 

I  would  also  like  to  thank  my  elassmates  for  their  eonsistent  support  throughout 
the  past  18  months.  Thank  you  for  all  of  your  help  at  AFIT.  Finally  and  most 
importantly,  I  would  like  to  thank  my  wife  for  her  eonstant  love  and  support;  I  eould  not 
have  done  it  without  you. 


Nathan  J.  Leap 


V 


Table  of  Contents 


Page 

Abstract . iv 

Acknowledgments . v 

List  of  Figures . ix 

List  of  Tables . xii 

1.  Introduction . 1 

Background . 1 

Problem  Statement . 2 

Outline  of  Thesis . 2 

IT  Literature  Review . 4 

Introduction . 4 

Air  Force  Targeting . 4 

Statistical  Independence . 5 

Fusion  Methods . 5 

ISOC  Fusion  Method . 6 

Sensor  Probability  Matrices . 7 

Combat  ID  System  States . 8 

Fusion  Rules . 9 

Optimal  Rule  Using  Total  Cost . 12 

ROC  Fusion  Methods . 15 

ROC  “Within”  Fusion  Method . 15 

PNN  Fusion  Method . 19 

Generalized  Regression  Neural  Network  (GRNN)  Model . 21 

Sample  Size  Considerations . 22 

Chapter  Summary . 22 

III.  Methodology . 24 

Introduction . 24 

Correlation . 24 

Data  Generation . 25 

Problem  1 :  4  Feature  Case . 26 

Problem  2:  8  Feature  Case . 27 

Problem  3:  8  Feature  with  Autocorrelation  Case . 28 

Problem  4:  8  Feature  Triangle  Case . 30 

Problem  5:  8  Feature  XOR  Case . 32 

Problem  6:  8  Feature  XOR  with  Autocorrelation  Case . 33 


VI 


Problem  7:  20  Feature  with  Feature  Seleetion  Case . 36 

Problem  8:  36  Feature  Case  with  Feature  Seleetion  Case . 39 

Experimental  Design . 41 

ISOC  Application . 42 

ROC  “Within”  OR  Application . 42 

PNN  Application . 42 

One  Big  Network  Application . 43 

Feature  Selection . 43 

Total  Probability  of  Misclassification . 43 

Sample  Size  Variation . 44 

Chapter  Summary . 45 

IV.  Findings  and  Analysis . 46 

Introduction . 46 

Problem  1  Results:  4  Feature  Case,  Single  Sample  Size . 46 

Problem  1  Results:  4  Feature  Case,  Varying  Sample  Size . 48 

Problem  2  Results:  8  Feature  Case,  Single  Sample  Size . 49 

Problem  2  Results:  8  Feature  Case,  Varying  Sample  Size . 51 

Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  Single  Sample  Size . 53 

Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  Across  Sample  Sizes . 56 

Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  An  ANOVA  Approach . 57 

Problem  4  Results:  8  Feature  Triangle  Case,  Single  Sample  Size . 64 

Problem  4  Results:  8  Feature  Triangle  Case,  Varying  Sample  Size . 72 

Problem  5  Results:  8  Feature  XOR  Case,  Single  Sample  Size . 73 

Problem  5  Results:  8  Feature  XOR  Case,  Varying  Sample  Size . 77 

Problem  6  Results:  8  Feature  XOR  with  Autocorrelation  Case,  Single  Sample  Size  .  79 

Problem  6  Results:  8  Feature  with  Autocorrelation  Case,  Across  Sample  Sizes . 81 

Problem  6  Results:  8  Feature  XOR  with  Autocorrelation  Case,  An  ANOVA  Approach 

. 85 

Problem  7  Results:  20  Feature  without  Autocorrelation  Case  using  Feature  Selection 

. 88 

Problem  8  Results:  36  Feature  without  Autocorrelation  Case,  Single  Correlation, 

Single  Sample  Size,  using  Feature  Selection . 93 

Problem  9  Results:  TPM  Exploration . 94 

Chapter  Summary . 98 

V.  Conclusion . 99 

Introduction . 99 

Eiterature  Review  Eindings . 99 

Methodology  Employed . 100 

Results . 100 

Recommendations  for  Euture  Research . 102 


vii 


Bibliography . 104 

Vita . 105 


List  of  Figures 


Page 

Figure  1 ;  Sensor  Fusion  Proeess  Overview . 6 

Figure  2;  ISOC  Sensor  Fusion  Proeess  (Haspert,  2002) .  14 

Figure  3:  ROC  “Within”  OR  Sensor  Fusion  Proeess . 18 

Figure  4;  A  Probabilistie  Neural  Network  (Wasserman  and  Nostrand,  1993) .  19 

Figure  5;  PNN  Sensor  Fusion  Proeess . 21 

Figure  6:  Inter-eorrelation . 24 

Figure?:  Intra-eorrelation . 25 

Figure  8:  4  Feature  ROC  Curves,  N=1000 . 47 

Figure  9:  4  Feature  ROC  Curves,  Aeross  Sample  Sizes . 49 

Figure  10:  8  Feature  ROC  Curves,  N=1000 . 50 

Figure  11:  8  Feature  ROC  Curves,  Aeross  Sample  Sizes . 52 

Figure  12:  Feature  1  over  Time:  0.0  Level  of  Autoeorrelation . 53 

Figure  13:  Feature  1  over  Time:  0.9  Level  of  Autoeorrelation . 54 

Figure  14:  8  Feature  with  Autoeorrelation  Case,  N=1000 . 55 

Figure  15:  8  Feature  with  Autoeorrelation  Case,  Aeross  Sample  Sizes . 56 

Figure  16:  ISOC  Histogram  of  Residuals . 59 

Figure  17:  ISOC  Residual  TP  Probability  vs  Row  Number . 60 

Figure  18:  ISOC  Residuals  vs  Row  Number,  Resorted . 61 

Figure  19:  ISOC  Histogram  of  Average  Residuals . 63 

Figure  20:  ISOC  Residual  TP  Probability  vs  Row  Number . 63 

Figure  21:  8  Feature  Triangle  ROC  Curves,  N=1000 . 64 


IX 


Figure  22:  PNN  Feature  Space  Plot,  0.0  Correlation,  N=1000 . 66 

Figure  23:  PNN  Feature  Space  Plot,  0.9  Correlation,  N=1000 . 67 

Figure  24:  Feature  Space  of  Feature  1  and  Feature  2,  0.0  Correlation,  N=1000 . 68 

Figure  25:  Feature  Space  of  Feature  1  and  Feature  2,  0.9  Correlation,  N=1000 . 69 

Figure  26:  Feature  Space  of  Feature  1  and  Feature  4,  0.0  Correlation,  N=1000 . 70 

Figure  27:  Feature  Space  of  Feature  1  and  Feature  4,  0.9  Correlation,  N=1000 . 71 

Figure  28:  8  Feature  Triangle  ROC  Curves,  Across  Sample  Sizes . 72 

Figure  29:  8  Feature  XOR  ROC  Curves,  N=1000 . 74 

Figure  30:  8  Feature  XOR  ROC  Curves  with  More  Separation,  N=1000 . 75 

Figure  31:  Individual  Classifier  ROC  Curves  for  0.0  Correlation . 76 

Figure  32:  Individual  Classifier  ROC  Curves  for  0.9  Correlation . 77 

Figure  33:  8  Feature  XOR  ROC  Curves,  Across  Sample  Sizes . 78 

Figure  34:  8  Feature  XOR  with  Autocorrelation  Case,  N=1000 .  80 

Figure  35:  8  Feature  XOR  with  Autocorrelation  Case,  Across  Sample  Sizes . 82 

Figure  36:  PNN  Fusion  Feature  Space,  0.0  Autocorrelation . 83 

Figure  37:  PNN  Fusion  Feature  Space,  0.9  Autocorrelation . 84 

Figure  38:  ISOC  Histogram  of  Residuals . 87 

Figure  39:  ISOC  Residual  TP  Probability  vs  Row  Number . 87 

Figure  40:  20  Feature  without  Autocorrelation  Case,  N=50 . 88 

Figure  41:  20  Feature  without  Autocorrelation  Case,  N=1000 . 90 

Figure  42:  True  Positive  Values  vs.  Correlation  Level  for  0.1  False  Positive  Rate . 91 

Figure  43:  True  Positive  Values  vs.  Correlation  Level  for  0.1  False  Positive  Rate . 92 

Figure  44:  Classification  Accuracy  vs.  Number  of  Features  for  3  Fusion  Methods . 94 


X 


Figure  45:  TPM  Values  vs  Correlation,  4  Feature  No  Autoeorrelation  Case . 95 

Figure  46:  4  Feature  Problem,  N=1000 . 96 

Figure  47:  4  Feature  Problem,  N=50 . 97 


XI 


List  of  Tables 


Page 

Table  1:  Sensor  Probability  Matrix . 7 

Table  2:  Sensor  Output  State  Combinations . 8 

Table  3:  Sensor  Output  State  Combinations,  Two  Sensors  and  Two  Output  States . 9 

Table  4:  Data  Generation  Descriptions . 26 

Table  5:  Results  Descriptions . 46 

Table  6:  Summary  of  ANOVA  Results . 58 

Table  7:  Summary  of  ANOVA  Results,  Averaged . 62 

Table  8:  Summary  of  XOR  ANOVA  Results,  Averaged . 85 


xii 


AN  INVESTIGATION  OF  THE  EFFECTS  OF  CORREEATION, 


AUTOCORRELATION,  AND  SAMPLE  SIZE  IN  CLASSIFIER  FUSION 

I.  Introduction 

Background 

In  general,  a  classifieation  problem  is  a  situation  where  it  is  of  interest  to  deseribe 
members  of  a  specific  number  of  classes  by  certain  attributes,  or  features,  that  the 
members  possess.  In  the  Air  Force,  a  common  classification  problem  is  trying  to  classify 
targets  as  hostile,  friendly,  neutral,  or  otherwise,  based  upon  certain  features  that  each 
class  possess.  In  Air  Force  Doctrine,  the  Air  Force  warns  its  members  not  to  strike 
targets  based  on  single  source  intelligence;  at  some  level,  intelligence  information  should 
be  fused  together  (AFPAM  14-210).  Many  fusion  models  are  based  on  the  assumption 
that  each  of  the  inputs,  in  some  cases  individual  classifiers,  to  the  model  is  independent. 
In  the  real  world,  there  are  times  when  classifiers  are  looking  at  similar  information  and 
are  not  actually  independent;  that  is,  knowing  the  output  of  one  classifier  provides 
information  about  the  output  of  another  classifier.  The  more  dependent  one  classifier  is 
on  the  other,  the  less  new  information  is  present  from  the  additional  classifier.  In 
addition,  if  a  classifier  is  observing  a  target  through  time,  each  observation  that  it  takes 
may  not  be  independent  of  the  previous  observations.  Again,  this  means  that  less  new 
information  is  present  if  the  observations  are  correlated  in  time.  Not  much  is  known 
about  the  performance  of  fusion  techniques  when  faced  with  correlation  (Willett,  et  al, 
2000).  In  addition,  the  number  of  observations  that  are  gathered  can  significantly  impact 
the  performance  of  an  individual  classifier  and  thus  fusion  of  individual  classifiers.  If 


I 


there  are  many  features  present  in  each  observation,  it  may  be  beneficial  to  only  select 
certain  features  that  provide  more  information  than  others.  This  thesis  examines  the 
effects  of  sample  size,  both  of  these  types  of  correlation,  and  feature  selection  on  four 
different  fusion  models  in  a  variety  of  different  problems.  Four  different  fusion  models 
are  used  throughout  this  thesis.  Two  of  these  models  assume  that  each  classifier  is 
independent  from  the  other  classifier.  Identification  System  Operating  Characteristic 
(ISOC)  (Haspert,  2002)  and  Receiver  Operating  Characteristic  (ROC)  “Within”  (Oxley 
and  Bauer,  2002);  two  of  these  models  make  no  such  assumption.  Probabilistic  Neural 
Network  (PNN)  and  One  Big  Network  (OBN). 

Problem  Statement 

In  this  thesis,  the  effects  of  sample  size,  two  types  of  correlation,  and  feature 
selection  on  four  different  fusion  models  in  a  variety  of  different  problems  are  examined. 
Each  problem  is  constrained  to  a  two-class  problem  where  the  two  classes  are  friendly 
and  hostile,  and  for  each  problem,  only  two  classifiers  are  fused  via  each  fusion  method. 
The  fusion  models  are  first  tested  on  simple  problems,  and  the  problems  increase  in 
degree  of  complexity. 

Outline  of  Thesis 

This  thesis  is  divided  into  five  chapters:  Introduction,  Literature  Review, 
Methodology,  Findings  and  Analysis,  and  Conclusions.  The  following  is  a  brief 
description  of  the  contents  of  each  chapter. 

Chapter  1:  Introduction  -  This  chapter  discusses  the  background,  problem 
statement,  and  outline  of  the  thesis. 


2 


Chapter  2:  Literature  Review  -  This  chapter  summarizes  the  pertinent  literature 
on  reasons  for  fusing  elassifiers,  four  types  of  classifier  fusion,  statistieal  independenee  of 
elassifiers,  and  sample  size  eonsiderations. 

Chapter  3:  Methodology  -  This  ehapter  deseribes  the  general  methodology 
employed  in  this  thesis.  It  deseribes  the  two  types  of  eorrelation,  the  data  generation 
proeess  for  eaeh  of  the  different  problems,  applieation  issues  for  eaeh  of  the  four  fusion 
methods,  feature  seleetion,  TPM,  and  sample  size  variation. 

Chapter  4:  Findings  and  Analysis  -  This  ehapter  describes  the  findings  and 
analysis  for  eaeh  of  the  problems  explored  in  this  thesis. 

Chapter  5:  Conclusion  -  This  ehapter  summarizes  the  results  of  the  research  and 
provides  suggestions  for  future  researeh. 


3 


II,  Literature  Review 


Introduction 

This  chapter  provides  a  summary  of  the  pertinent  literature  available  on  reasons 
for  fusing  elassifiers  as  well  as  elassifier  fusion  teehniques.  First,  the  Air  Foree  mandates 
that  fusion  take  plaee  when  attaeking  a  target;  reasons  for  fusing  elassifiers  are  given  in 
Air  Foree  Doetrine.  Next,  the  statistieal  independenee  of  the  elassifiers  assumption  is 
diseussed.  Then,  details  from  eaeh  of  the  four  fusion  models  are  provided.  Finally,  some 
sample  size  eonsiderations  are  discussed. 

Air  Force  Targeting 

Any  time  the  United  States  Air  Foree  prepares  to  attack  a  target,  there  are  six 
steps  neeessary  for  the  mission:  deteetion,  loeation,  eombat  identifieation,  deeision, 
execution,  and  assessment  (AFP AM  14-210,  1998).  Often,  eombat  identification  is 
perceived  as  the  weakest  of  these  six  steps  sinee  no  sensor  performs  perfeetly  all  of  the 
time  (Haspert,  2000).  Commanders  should  be  cautious  of  even  the  best  intelligence  on  a 
target,  espeeially  when  it  eomes  from  a  single  souree  (AFDD  2-1,  2000).  Normally, 
intelligenee  on  a  target  should  not  be  based  on  a  single  souree  (AFP AM  14-210,  1998). 
This  leads  to  the  implementation  of  multiple  sensors;  eombining  information  from 
multiple  sources,  data  fusion,  inereases  the  confidenee  in  the  eombat  identifieation  step 
(AFP AM  14-210,  1998).  Also,  data  fusion  inereases  the  reliability  of  the  information  and 
makes  it  more  eredible  and  reliable  (AFPAM  14-210,  1998).  Combining  outputs  from 
multiple  sensors  in  order  to  get  a  better  overall  elassifieation  aeeuraey  is  ealled  sensor 
fusion.  This  thesis  focuses  on  improving  eombat  identifieation  through  sensor  fusion. 


4 


Statistical  Independence 

Many  sensor  fusion  methods  make  the  assumption  that  the  individual  sensors  are 
statistieally  independent.  If  two  or  more  sensors  are  statistieally  independent,  it  makes 
sense  to  eombine  these  sensors  to  make  a  better  overall  deeision.  However,  if  two  or 
more  sensors  are  identical,  no  more  information  can  be  gained  by  adding  the  additional 
sensors  (Shipp  and  Kuncheva,  2002).  In  real  world,  there  are  times  when  the  features 
observed  by  one  sensor  are  correlated  with  features  observed  by  another  sensor;  this 
creates  statistically  dependent  sensors.  Little  is  known  about  how  sensor  fusion  methods 
perform  in  the  presence  of  statistical  dependence  since  most  methods  assume  statistical 
independence  (Willett,  et  al,  2000).  In  previous  research,  the  Gaussian  shift-in-means 
problem  was  examined  in  the  presence  of  correlation,  and  this  problem  can  be  broken 
down  into  three  regions:  the  “good,”  the  “bad,”  and  the  “ugly  (Willet,  et  ah,  2000).”  It 
was  shown  that  for  the  logical  “and”  and  logical  “or”  rules,  any  problem  in  the  “good” 
threshold  region  should  use  optimal  sensor  rules  just  like  those  used  in  the  presence  of 
statistical  independence  (Willet,  et  ah,  2000). 

Fusion  Methods 

Three  methods  of  sensor  fusion  are  used  in  this  thesis:  Identification  System 
Operating  Characteristic  (ISOC)  Fusion,  Receiver  Operating  Characteristic  (ROC) 
“Within”  Fusion,  and  Probabilistic  Neural  Network  (PNN)  Fusion.  Although  these 
methods  take  different  approaches,  they  have  the  same  overall  goal.  Each  sensor  fusion 
method  seeks  to  improve  upon  the  classification  accuracy  of  a  single  sensor  by 
combining  the  outputs  of  multiple  sensors  into  a  single  output.  Figure  1  shows  the 
overall  sensor  fusion  process. 


5 


Sensor  Fusion  Process  Overview 


Figure  1:  Sensor  Fusion  Process  Overview. 


ISOC  Fusion  Method 

The  Identification  System  Operating  Characteristic  (ISOC)  method  determines  the 
optimal  fusion  rule  set  for  a  given  threshold  through  a  novel  algorithm  (Haspert,  2000). 


6 


This  is  a  paradigm  shift  from  the  traditional  fixed  rules.  Although  fixed  rules  are  easy  to 
employ,  they  are  not  usually  optimal  (Haspert,  2000).  On  the  other  hand,  adaptive  rules 
such  as  Bayesian  techniques  find  an  optimal  ID  sensor  fusion  rule  based  on  data  from  a 
specific  target  instead  of  fixing  a  rule  across  all  data  sets  (Haspert,  2000).  These  adaptive 
rules  are  based  on  the  results  of  individual  classifiers  through  a  sensor  probability  matrix 
(Haspert,  2000). 

Sensor  Probability  Matrices 

Combat  Identification  Systems  (CID)  take  inputs  from  individual  sensors  and 
combine  these  inputs  to  form  an  overall  classification  (Haspert,  2000).  The  output  of 
each  individual  sensor  for  a  given  threshold  can  be  output  in  the  following  format  shown 
in  Table  1. 


Table  1:  Sensor  Probability  Matrix. 


Indication 

“H” 

“P” 

3 

H 

P(“H”|H) 

P(“F”|H) 

H 

F 

P(“H”|F) 

P(“F”|F) 

The  values  in  this  table  will  change  as  the  threshold  changes  for  each  individual  sensor. 
The  rows  of  this  matrix  represent  the  possible  types  of  targets  that  each  individual  sensor 
can  observe  where  F  represents  friend  and  H  represents  hostile,  and  the  columns  of  this 
matrix  represent  the  possible  sensor  outputs.  P(“H”|H)  is  the  conditional  probability  of 
the  sensor  designating  the  target  as  “H”  given  the  target  is  a  hostile.  The  other 
conditional  probabilities  are  similar.  In  this  case,  the  indication  “H”  is  considered  a 
positive  and  the  indication  “F”  is  considered  a  negative.  Therefore,  P(“H”|H)  is  the 


7 


probability  of  true  positive,  P(“H”|F)  is  a  false  positive,  P(“F”|H)  is  a  false  negative,  and 
P(“F”|F)  is  a  true  negative. 

Combat  ID  System  States 

Let  Ns  be  the  number  of  sensors  on  a  target.  Let  i  denote  the  index  of  those 
sensors  where  1<  i  <  Ng.  Let  ni  denote  the  number  of  indieator  states  for  sensor  i.  Let  ki 
be  a  specifie  output  state  for  sensor  i.  Using  these  definitions,  there  will  be  N  total 
distinet  eonfigurations  of  the  Combat  ID  system  where 

N  =  ]^«,  (Raison,  1998) 

1=1 

N 

Let  (S'  =  [J  Sj  be  all  possible  eonfigurations  of  the  CIS  where  Sj  is  the  j*  output  state  of 

j=i 

the  CIS  and  I<  j  <  N.  Eaeh  Sj  =  {5/  ,5^  }  where  5/  is  the  state  of  the  i*  sensor 

in  the  j*  eonfiguration  (Storm,  Bauer,  and  Oxley,  2003).  Thus,  5*  is  an  N  x  Ng  matrix. 
Table  2  shows  the  possible  eombinations  of  S. 

Table  2:  Sensor  Output  State  Combinations. 


j 

Sj 

1 

(^1  ,  ^2  5  ) 

2 

(^1  ,  ^2  9  ^^3  V9 ) 

3 

1  5  0  2  9  ^3  V*  *9  ^  )V  / 

N 

/  N  N  N  N  \ 

(S^  ,^2  9^3  9...9^7V,) 

8 


For  the  two  sensor,  two  state  ease,  S  is  a  4  x  2  matrix.  Table  3  shows  the  possible 


eombinations  of  S  for  this  ease. 

Table  3:  Sensor  Output  State  Combinations,  Two  Sensors  and  Two  Output  States, 


J 

Sj 

1 

(5;,fy)=(“H”,“H”) 

2 

(5fy52^)=(“H”,“F”) 

3 

(5f,52^)=(“F”,“H”) 

4 

(5fy52fy=(“F”,“F”) 

Under  the  assumption  that  all  sensors  are  independent,  the  probability  of  a  sensor 
eonfiguration  given  truth  simply  equals  the  multiplieation  of  the  probabilities  of  the 
individual  sensors  in  that  eonfiguration  given  truth  (Ralston,  1998).  This  is  given  by  the 
following  equation 

i=l 

For  the  two-class  problem,  in  the  previous  equation,  T  e  {H,F}.  For  each  possible  output 
combination,  Sj,  the  probabilities  P(Sj|H)  and  P(Sj|F)  must  be  calculated.  Since  every 
potential  target,  regardless  of  whether  it  is  friendly  or  hostile,  will  put  the  CIS  into  some 

N  N 

State,  '^P{S j  I  F)  =^P{S ^  \  H)  =  1  (Ralston,  1998).  After  all  these  probabilities  have 

V=1  ./=1 

been  calculated,  the  fusion  rules  must  be  defined  (Ralston,  1998). 

Fusion  Rules 

There  will  be  times  when  the  CIS  will  receive  conflicting  indications  from  the 
individual  sensors.  The  fusion  rules  resolve  all  of  these  conflicts  by  specifying  when  to 


9 


declare  hostile  and  when  not  to  declare  hostile  (Ralston,  1998).  In  the  two  state  problem, 
a  complete  ID  fusion  rule  can  be  expressed  as  an  N-dimensional  vector  R  =  (ri,  r2,  ...  ,  rN) 
where  rj  e  {0,1}  andj  =  1,  2,...,  N  (Ralston,  1998).  In  this  case,  each  element  of  R 
corresponds  to  an  element  of  S.  If  rj  =  1 ,  rule  Sj  should  be  included  in  the  rule  set 
(Ralston,  1998).  For  example,  in  the  two-class,  two  sensor  problem  defined  above,  if  R  = 
(1,  0,  1,  0),  rules  Si  =  (“H”,“H”)  and  S3  =  (“F”,“H”)  should  be  included  in  the  rule  set. 
Thus,  a  target  will  be  declared  hostile  if  either  Si  or  S3  occurs.  For  each  specific  fusion 
rule,  the  probability  of  that  rule  given  truth  can  be  found  with  the  following  equation. 

PiR\T)  =  f^PiS,.\T)-r^.. 

./=i 

By  substituting  the  equation  above, 

N  V, 

j=i 

For  the  two-class  problem  where  T  e  {H,F},  an  equation  for  each  element  of  T 

follow: 

N  N 

P(R\H)  =  Y,(YlP(snH))-r^ 

J=l  i=l 

N 

p(RiF)=X(n^("/  i^))-o 

J=1  i=l 

Now  that  these  probabilities  have  been  calculated,  R  must  be  chosen  so  that  the 
probability  of  a  true  positive,  P(R|H)  in  the  two-class  problem,  is  maximized  while  the 
probability  of  a  false  positive,  P(R|F)  in  the  two-class  problem,  is  minimized  (Ralston, 
1998).  However,  there  are  a  total  of  2^^  distinct  possible  Boolean  fusion  rules.  When  N  is 
large,  it  is  not  feasible  to  test  this  many  rules,  but  a  smaller  subset  of  all  possible  fusion 


10 


rules  that  represents  the  best  performanee  ean  be  defined  and  seleeted  for  a  given  sensor 
suite  (Ralston,  1998). 

When  finding  the  subset  of  all  possible  Boolean  fusion  rules,  there  are  two 
obvious  rules:  “never  deelare  hostile”  and  “always  deelare  hostile.”  The  least 
eonservative  rule  is  “always  deelare  hostile”  where  Rj  =  1  for  all  j.  The  most 
conservative  rule  is  “never  declare  hostile”  where  Rj  =  0  for  all  j  (Ralston,  1998).  The 
next  most  conservative  rule  is  the  rule  which  includes  just  one  state  which  has  the  highest 
likelihood  ratio  P(Sj|H)/P(Sj|F).  This  fusion  rule  is  better  than  any  other  fusion  rule  with 
just  one  single  state  included.  The  next  fusion  rule  includes  the  previous  rule  as  well  as 
the  next  most  likely  state  (i.e.,  the  second  rule  includes  two  states).  This  fusion  rule  is 
better  than  any  other  fusion  rule  with  just  two  states  included.  This  process  is  repeated 
until  the  least  conservative  rule  is  reached  or  Rj  =  1  for  all  j  (Ralston,  1998).  In  essence, 
this  method  creates  the  ISOC  boundary.  The  following  ISOC  boundary  algorithm  will 
create  this  boundary  (Storm,  Bauer,  and  Oxley,  2003). 

1.  Compute  P(Sj|T)  for  all  j  and  T  using  data  from  the  sensor  probability  matrices  from 
the  individual  sensors. 

2.  Compute  LR:’=P(SjH)/P(Sj|F)  for  all  j,  the  likelihood  ratio  for  all  sensor  output  state 
combinations. 

3.  Rank  LR-'  for  all  j  from  highest  to  lowest,  where  CRjfj  is  the  largest  LR'  and  is 
the  smallest  LR:’,  such  that 

4.  Choose  Sj  corresponding  to  the  largest  remaining  to  be  included  in  the  fusion 

rule  (i.e.,  =  1  in  R). 


11 


5.  Go  to  3  unless  rj  =  1  for  all  j. 

Using  the  data  from  the  sensor  probability  matriees,  the  N  distinet  CIS 
eonfigurations  are  tested  and  “turned  on”  in  deereasing  order  of  their  likelihood  ratios 
(Ralston,  1998).  In  a  system  with  N  states,  there  will  be  N+1  points  that  eonneet  the 
most  eonservative  rule  and  least  eonservative  rule.  Eaeh  of  these  points  is  a  valid  fusion 
rule;  eaeh  rule  provides  an  alternative  trade-off  between  fratrieide  (ineorreetly  targeting  a 
friendly)  and  effeetiveness  (eorreetly  targeting  a  hostile).  There  is  no  rule  that  provides  a 
higher  level  of  effeetiveness  at  the  same  or  lower  fratrieide  rate;  there  is  no  rule  that  ean 
provide  a  lower  level  of  fratrieide  at  the  same  or  higher  level  of  effeetiveness  (Ralston, 
1998).  The  optimal  trade-off  between  fratrieide  and  effeetiveness  depends  on  eombat 
requirements  (Ralston,  1998). 

Optimal  Rule  Using  Total  Cost 

Now  that  a  subset  of  all  possible  rules  has  been  identified,  the  optimal  rule  must 
be  ehosen.  For  eaeh  of  the  rules  in  the  subset,  a  eost  ean  be  ealeulated.  These  eosts 
depend  only  on  the  prior  probabilities  and  the  relative  eosts  (Haspert,  2000).  The  eost 
equation  is  given  by 

Cxotal  ~  CFalseNegative*f*Hostile*f*(F^lse  Negative)+CFalse  Positive*f*Friend*f*(F^lse  Positive). 
where 

Cxotai  =  Total  Cost 

Cpaise  Negative  =  Cost  of  False  Negative 

Cpaise  Positive  =  Cost  of  False  Positive 

Pnostiie  =  Prior  Probability  of  a  Hostile 

Ppriend  =  Prior  Probability  of  a  Friend 


12 


P(True  Positive)  =  P(R|H)  =  Probability  of  a  True  Positive 
P(False  Negative)  =  1-P(R|H)  =  Probability  of  a  False  Negative 
P(False  Positive)  =  P(R|F)  =  Probability  of  a  False  Positive 
P(False  Negative)  =  1  -  P(True  Positive)  (Haspert,  2000). 

Finally,  the  lowest  eost  rule  ean  be  ehosen  as  the  optimal  rule.  Figure  2  is  a  process 
diagram  of  the  ISOC  Fusion  Method. 


13 


ISOC  Sensor  Fusion 
Process 


Training  Features 


Testing  Features 


Individual 

Sensor 

Classification 


Posterior  \ 

Posterior  \ 

\ 

Posterior 

Probabilities  | 

Probabilities  | 

Probabilities 

Data 

Pre- 

Processing 

for 

ISOC 


Predicted  \ 
Class  Using 
0.5 

ITtreshold 


Predicted  \ 
Class  Using 
0.5 

Threshold 


Predicted  \ 
Class  Using 
0.5 

Threshold 


Sensor 

Sensor 

Sensor 

Probability  \ 

Probability  \ 

Probability  \ 

Matrix 

Matrix 

Matrix 

14 


ROC  Fusion  Methods 


Two  possible  procedures  for  sensor  fusion  are  called  the  ROC  “Across”  Fusion 
Method  and  the  ROC  “Within”  Fusion  Method.  ROC  “Across”  Fusion  is  applicable 
when  multiple  sensors  are  monitoring  multiple  critical  components  in  different  feature 
sets.  The  ROC  “Across”  Method  is  concerned  with  the  state  of  a  collection  of 
components  as  viewed  by  multiple  sensors.  On  the  other  hand,  ROC  “Within”  Fusion  is 
applicable  when  multiple  sensors  are  monitoring  the  same  critical  component  in  the  same 
feature  set.  The  ROC  “Within”  Method  is  concerned  with  the  state  of  a  single  component 
as  viewed  by  multiple  sensors.  In  this  thesis,  only  the  ROC  “Within”  Method  is  used  for 
sensor  fusion. 

ROC  “Within”  Fusion  Method 

While  the  ISOC  method  finds  the  optimal  rule  for  a  given  threshold,  the  ROC 
“Within”  Fusion  method  finds  the  optimal  thresholds  for  each  classifier  for  a  given  rule, 
the  “logical  or”  rule.  The  “Within”  Fusion  method  fuses  the  ROC  curves  together  from 
individual  sensors  using  the  same  or  different  feature  sets  to  form  a  fused  ROC  curve 
(Clutz,  2000).  Each  individual  classifier  will  output  a  sensor  probability  matrix  shown  in 
Table  1  where  the  definitions  of  true  positive,  false  positive,  true  negative,  and  false 
negative  are  the  same  as  above.  Again,  any  indication  “H”  (“H”|H  and  “H”|F)  is 
considered  a  positive  and  any  indication  “F”  (“F”|H  and  “F”|F)  is  considered  a  negative. 
Let  P^phQ  the  probability  of  true  positive  for  classifier  A,  Pppho,  the  probability  of  false 
positive  for  classifier  A,  F’j.^be  the  probability  of  true  negative  for  classifier  A,  and 
Pfi^hQ  the  probability  of  false  negative  for  classifier  A.  Let  P^phQ  the  probability  of  true 


15 


positive  for  classifier  B,  PpphQ  the  probability  of  false  positive  for  classifier  B,  the 

probability  of  true  negative  for  classifier  B,  and  Ppf^hQ  the  probability  of  false  negative 

for  classifier  B  (Clutz,  2000).  The  ROC  curve  for  each  classifier  is  the  set  of  coordinate 
points  where  a  value  of  true  positive  (ordinate)  is  specified  for  every  value  of  false 
positive  (abscissa).  Each  of  these  coordinate  points  corresponds  to  a  different  threshold 
value  for  the  individual  classifier.  The  ROC  “Within”  Fusion  Method  uses  these 
coordinate  pairs,  at  common  points  along  the  abscissa  for  classifier  A  and  classifier  B,  to 
form  a  new  fused  ROC  curve  (Clutz,  2000).  Let  classifier  C  be  the  classifier  resulting 
from  fusing  classifiers  A  and  B  according  to  the  “logical  or”  rule.  Classifier  C  will  result 
in  a  positive  indication  in  three  cases:  when  both  classifier  A  and  classifier  B  indicate 
positive,  when  only  classifier  A  indicates  positive,  and  when  only  classifier  B  indicates 
positive.  For  a  two-class  problem,  Ppp  =  1  -  which  implies  Ppp  =  1  -  Pjf^ .  Assuming 
the  logical  “or”  rule  is  used  and  assuming  the  independence  of  classifiers  A  and  B,  then 

P^p=\-{P^*P^)  =  \-{\-P^p)*{\-P^p)  =  {P^p+P^p-P^p^P^p). 

For  a  two-class  problem,  Ppp  =  1  -  Ppp,  so  that  Ppp  =  1  -  Pp^, .  Assuming 
independence  of  classifiers  A  and  B,  then  (Clutz,  2000) 

p<=  =  \-P^  ^pB  -\_n_p^\^n_pB\_(pA,pB_pA  pB^ 

1  TP  ^  FN  FN  ^  ^  TP )  ^  TP  J  \-^  TP  ^  TP  ^  TP  ^  TP )  ' 

Thus,  the  point  on  the  fused  ROC  curve  is  given  by  the  coordinate  pair  (Clutz,  2000) 

( \  =  (  p^  4-  P^  —  P^  -i-P^  —  P^  ^ 

yi  Fp  9  -‘TP  /  FP  '  ^  FP  ^  FP  ^  FP  ’  ^TP  '  ^  TP  ^TP  ^  TP  /  * 

Using  these  results,  an  optimization  algorithm  can  be  used  to  form  the  fused  ROC 
curve  and  find  the  optimal  thresholds  for  each  individual  classifier.  Let  p  be  a  value  of 
false  positive  for  classifier  A  and  fA(p)  be  a  value  of  true  positive  for  classifier  A. 


16 


Similarly,  let  q  be  a  value  of  false  positive  for  elassifier  B  and  fsCq)  be  a  value  of  true 
positive  for  elassifier  B.  Let  r*  be  a  value  of  the  false  positive  for  the  fused  elassifier  C 
and  fc(r*)  be  a  value  of  true  positive  for  the  fused  elassifier  C.  ft  should  be  noted  that  q  is 
a  funetion  of  r  and  p;  that  is 

r  =  p  +  q  -  p*q. 

(r  —  p) 

Thus,  q  =  Q{r,  p)  = - .  Using  this  notation,  the  equation  above  ean  be  rewritten  as 

O^-p) 

{r  ,fc{r  ))  =  {p  +  q-p*q,r^&^Q<pAfAiP)  +  fBiQir,p))-fAip)*fBiQir,p))'\)  • 

Now,  for  eaeh  value  of  r,  a  value  of  p,  denoted  p*,  ean  be  found  sueh 
f  A  ip)  +  Ib  iQir,  p))  -  f  A  ip)  *  Ib  iQir,  p))  is  maximized  subjeet  to  0<  p<r  (Storm,  Bauer, 
and  Oxley,  2003).  After  p*  is  determined,  fA(p*)  can  be  read  from  the  ROC  eurve  for 
elassifier  A.  The  optimal  threshold  for  elassifier  A  is  the  value  0*  that  yields  p*  and 
fA(p*).  Using  the  relationship  r*  =  p*  +  q*  -  p*  *  q*  ,  q*  ean  be  determined,  and  fsCq*) 
ean  be  read  from  the  ROC  eurve  for  elassifier  B.  The  optimal  threshold  for  elassifier  B  is 
the  value  (p*  that  yields  q*  and  fsCq*)-  This  ean  be  done  for  all  values  of  r*.  After 
thresholds  for  eaeh  elassifier  have  been  found  for  eaeh  value  of  r*,  these  thresholds  ean  be 
applied  to  an  independent  data  set  for  validation.  Figure  3  is  a  proeess  diagram  of  the 
ROC  “Within”  OR  Fusion  Proeess. 


17 


ROC  “Within”  Sensor 
Fusion  Process 


Training  Features 


Testing  Features 


Individual 

Sensor 

Classification 


Posterior 

Probabilities 


Posterior 

Probabilities 


Posterior 

Probabilities 


Data 

Pre- 

Processing 

for 

ROC 

“Within” 


Predicted  ^ 
Class  For 
Every 
Itireshold 
0.0  to  1.0 


Predicted  ^ 
Class  For 
Every 
Threshold 
0.0  to  1.0 


Predicted  \ 
Class  For 
Every 
Threshold 
0.0  to  1.0 


Sensor  \ 
Probability 
Matrix 
for  every 
Threshold 


18 


PNN  Fusion  Method 


The  probabilistic  neural  network  (PNN)  fusion  method  is  a  simplistic  fusion 
method  that  involves  training  a  PNN  on  the  posterior  probabilities  from  the  individual 
classifiers.  The  result  is  a  single,  fused  classification.  The  PNN  has  been  used 
successfully  to  solve  a  variety  of  classification  problems  (Wasserman  and  Nostrand, 
1993).  When  compared  to  the  standard  back-propagation  algorithm,  the  PNN  has  the 
following  major  advantages;  rapid  training,  guaranteed  convergence  to  a  Bayesian 
Optimal  Classifier  with  enough  training  data,  allows  deletion  or  addition  from  training 
data  without  retraining,  and  confidence  indication  on  its  output  (Wasserman  and 
Nostrand,  1993). 


Xi  X2  ♦  *  •  Xn 


D  istrib  ution  Layer 


P  alter n  La  ye  r 


S  um  m  ation  La  ye  r 


D  ecision  La  yer 


Figure  4:  A  Probabilistic  Neural  Network  (Wasserman  and  Nostrand,  1993). 

This  method  is  based  on  the  assumption  that  the  feature  sets  are  normalized  and 
independent  and  identically  distributed  multivariate  normal  with  common  variance  a  . 
The  normalized  input  vector  X  =  (Xi,  X2,  . . .  ,  Xn)  is  applied  to  the  distribution  layer 
neurons.  This  input  vector  contains  the  features  to  be  classified  by  the  PNN.  The 
distribution  layer  does  not  perform  any  calculations;  it  is  simply  a  connection  point 


19 


(Wasserman  and  Nostrand,  1993).  Each  training  vector  is  used  to  calculate  a  set  of 
weights  where  each  weight  has  a  value  from  a  component  of  that  vector.  The  pattern 
layer  neurons  are  grouped  together  by  the  true  classification  of  the  associated  training 
vector;  these  individual  neurons  sum  the  weighted  inputs  from  the  distribution  layer 
neurons  (Wasserman  and  Nostrand,  1993).  This  is  equivalent  to  taking  the  sum  of 
squares  of  the  training  set  and  the  test  set,  (X-XRi)^(X-XRi)  where  Xri  is  the  i*  exemplar 
in  the  R**'  class  from  the  training  set.  Because  of  the  normalization,  this  reduces 
to  (Xl-X-  - 1)  (Wasserman  and  Nostrand,  1993).  Then,  the  pattern  layer  neurons  apply  a 

nonlinear  function  to  the  corresponding  sum.  This  produces  the  output  Zc,i,  where  c 
indicates  the  true  class  of  the  training  vector  and  i  indicates  the  pattern  layer  neuron 
(Wasserman  and  Nostrand,  1993).  The  nonlinear  function  for  Zc,i  is 


Z,,=exp[^"^^  ^ 


cr 


In  this  equation,  X  is  defined  above  and  the  set  of  weights  corresponding  to  a  pattern 
neuron  represent  a  training  vector  Xri  =  (Xri,  Xr2,  . . .  ,  Xrh).  The  summation  layer 
simply  sums  the  for  each  class  (Wasserman  and  Nostrand,  1993).  Thus,  the  output  of 
the  summation  layer  for  a  specific  class.  Sc  is 


i=l 

The  decision  layer  compares  Sc  for  all  classes  and  assigns  the  input  vector  to  the 
class  with  the  largest  corresponding  Sc.  In  essence,  this  PNN  assigns  a  new  feature  set  to 
the  class  that  the  feature  set  has  the  largest  probability  of  being  in  under  the  multivariate 
normal  distribution.  A  PNN  can  be  extended  to  any  number  of  classes  by  adding  pattern 
layer  neurons  and  a  summation  layer  neuron  for  each  class  (Wasserman  and  Nostrand, 


20 


1993).  Figure  5  is  a  process  diagram  of  the  PNN  Fusion  Process.  The  PNN  Fusion 
process  only  uses  half  the  data  that  the  ISOC  and  ROC  “Within”  Fusion  methods  use. 


PNN  Sensor  Fusion 
Process 


Figure  5:  PNN  Sensor  Fusion  Process, 

Generalized  Regression  Neural  Network  (GRNN)  Model 

A  GRNN  has  a  very  similar  structure  to  a  PNN,  but  it  has  one  slight  difference. 
While  the  PNN  simply  sums  the  nonlinear  function  for  each  class,  the  GRNN  also  sums 
this  across  all  classes.  Then,  each  Sc  is  divided  by  the  sum  of  all  the  Sc  values  and  that  is 
the  corresponding  activation.  Thus,  all  activations  are  standardized  to  a  value  between  0 
and  1  (Wasserman  and  Nostrand,  1993). 


21 


Sample  Size  Considerations 

In  real  world  problems,  distribution  parameters  are  not  known,  and  analysts  are 
typically  restrained  to  small  training  sets.  The  size  of  the  training  set,  especially  relative 
to  the  dimensionality  of  the  problem  or  number  of  features  used,  will  ultimately 
determine  how  close  the  estimated  distribution  parameters  are  to  those  of  the  true 
distribution.  In  other  words,  in  a  problem  with  few  features,  fewer  training  exemplars 
would  be  needed  when  compared  to  the  requirements  of  a  problem  with  many  features. 
As  the  number  of  features  grows,  the  sample  size  of  the  training  set  must  also  grow 
(Fukunaga  and  Hayes,  1989). 

Sample  size  also  plays  a  key  part  in  comparing  a  linear  classifier  and  quadratic 
classifier.  If  the  covariance  matrices  for  the  two  classes  are  equal  and  the  true  covariance 
matrix  was  used,  the  quadratic  classifier  and  the  linear  classifier  are  the  same.  However, 
when  the  approximated  covariance  matrices  are  used,  the  approximations  of  the 
covariance  matrices  will  not  be  the  same  even  though  their  true  covariance  matrices  yield 
the  same  results.  Since  the  linear  classifier  will  use  all  the  data  to  calculate  the 
covariance  matrix,  it  will  provide  a  better  approximation  of  the  true  covariance  matrix 
than  that  of  the  quadratic  classifier.  The  quadratic  classifier  would  need  much  more  data 
to  get  as  good  of  an  approximation  as  the  linear  classifier.  Thus,  in  a  case  where  the 
covariance  matrices  are  truly  equal,  the  linear  classifier  is  the  more  robust  classifier 
(Fukunaga  and  Hayes,  1989). 

Chapter  Summary 

This  chapter  summarized  the  important  literature  used  to  conduct  this  thesis. 

First,  Air  Force  guidance  on  data  fusion  was  summarized,  and  the  statistical 


22 


independence  assumption  was  explored.  Next,  the  four  fusion  methods  employed  in  this 
thesis  were  described  in  detail.  Finally,  some  sample  size  considerations  were  discussed. 


23 


III.  Methodology 


Introduction 

This  chapter  lays  out  the  basie  methodology  used  in  this  thesis.  First,  it  deseribes 
the  different  types  of  eorrelation  introdueed  into  the  fusion  models.  Next,  it  describes  the 
data  generation  process  for  eaeh  of  the  different  problems  explored.  Next,  the  general 
experimental  design  is  discussed  and  some  application  issues  for  each  of  the  four  fusion 
methods  are  detailed.  Finally,  feature  seleetion.  Total  Probability  of  Misclassifieation 
(TPM),  and  sample  size  variation  are  diseussed. 

Correlation 

In  this  thesis,  multiple  feature  sets,  each  containing  multiple  features,  are 
generated  for  experimentation.  Some  level  of  eorrelation  is  present  among  these  features. 
Two  types  of  eorrelation  are  eonsidered;  inter-eorrelation  and  intra-eorrelation.  The  first 
type  of  correlation  eonsidered  is  inter-eorrelation;  this  is  the  correlation  between  features 
in  a  given  data  set.  Figure  6  is  a  notional  diagram  representing  inter-eorrelation. 


Correlation  Correlation  Correlation 


Feature  1 ,  fi 

Feature  2,  f2 

Feature  3,  fs 

Feature  4,  U 

Exemplar  1 

Exemplar  1 

Exemplar  1 

Exemplar  1 

Exemplar  2 

Exemplar  2 

Exemplar  2 

Exemplar  2 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

■ 

Exemplar  N 

Exemplar  N 

Exemplar  N 

Exemplar  N 

Figure  6:  Inter-correlation. 


24 


The  second  type  of  correlation  considered  is  intra-correlation  or  autocorrelation; 
this  is  the  correlation  between  observations  in  a  single  feature.  Figure  7  is  a  notional 
diagram  representing  intra-correlation. 


Correlation 

Correlation 

Correlation 

Correlation 


Figure  7:  Intra-correlation, 

Data  Generation 

Since  real-world  data  is  not  available,  data  was  generated  for  a  variety  of 
problems  for  analysis  for  this  thesis.  The  following  Table  4  summarizes  the  problems 
analyzed  in  this  thesis. 


Feature  1,  fi 


25 


Table  4:  Data  Generation  Descriptions. 


Problem  # 

Problem  Name 

Problem  Description 

1 

• 

O 

4  Feature  Case 

Recreates  Storm  work;  average  ROC  curve 
of  5  runs  as  response 

2 

8  Feature  Case 

Adds  noise  and  redundant  features  to 
problem  1 ;  changes  mean  of  class  1 

3 

• 

o 

8  Feature  with  Autocorrelation 

Case 

Adds  autocorrelation  to  problem  2;  changes 
mean  of  class  1 

4 

o 

8  Feature  Triangle  Case 

Changes  geometry  of  problem  2 

5 

•  o 
o  • 

8  Feature  XOR  Case 

Changes  geometry  of  problem  4 

6 

•  o 
o  • 

8  Feature  XOR  with 

Autocorrelation  Case 

Adds  autocorrelation  to  problem  5 

7 

o® 

20  Feature  with  Feature  Selection 
Case 

Adds  more  noise  and  redundant  features  to 
problem  2;  explores  only  2  sample  sizes; 
changes  mean  of  class  1 

8 

36  Feature  with  Feature  Selection 
Case 

Adds  more  noise  and  redundant  features  to 
problem  7;  explores  only  1  sample  size  and  1 
correlation  level 

9 

• 

o 

TPM  Exploration 

Examines  problem  1  at  3  specific  levels  of 
correlation 

Problem  1:  4  Feature  Case 


O 


Let  F  =  Fj  X  Fj  cz  F"*  where  Fj  cz  is  the  feature  set  observed  by  sensor  1,  a 
linear  discriminant  function,  and  Fj  cz  F^  is  the  feature  set  observed  by  sensor  2,  a 


quadratic  discriminant  function.  The  correlation  of  the  data  is  given  by 


y '  y ' 

^  ^F„F2 

y  /  y  1 

_^F„F,  ^F„F,_ 

Since  all  the  features  in  the  individual  feature  sets  are  statistically  independent, 


26 


and  is  the  correlation  matrix  between  the  features  contained  in  feature  set  j  and 
feature  set  k  in  class  i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj°be  the  features  from  feature  set  1 
in  class  0  and  F/  be  the  features  from  feature  set  1  in  class  1  where  Fj  =  u  F/  .  Let 
be  the  mean  of  feature  set  1  in  class  0  and  //J  be  the  mean  of  feature  set  1  in  class  1 . 
Let  Fj°  ~  and  F,'  ~  ^  )  where  =(0,0)^  and 

=(0.95,0.95)^.  Let  =  F^  uFj  where  F^  ~  N {jul ,Y.^p^  pJ  and  F^  ~  N{ju\,Y}p^^p^) 
where  //“  =(0,0)^  and  ju\  =(1.15,1.15)^. 

Problem  2:  8  Feature  Case 

Let  F  =  Fj  X  F2  c:  F*  where  F,  c  is  the  feature  set  observed  by  sensor  1,  a 
linear  discriminant  function,  and  Fj  c  F"*  is  the  feature  set  observed  by  sensor  2,  a 
quadratic  discriminant  function.  The  correlation  of  the  data  is  given  by 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  1 
redundant  feature,  and  1  noise  feature  (same  mean). 

"10  0  o' 

v'  =y'  =  ^  ^  ^ 

Fuf.  F,,F,  q  ^  Q  • 

0  0  0  1 


Also, 


27 


s 


F^,F2 


E 


FiA 


0  P  pind  0 

p  0  0  0 

Pind  0  0  0 

0  0  0  0 


where  p  e  {0. 0,0. 2, 0.4, 0.6, 0.8,0. 9} ,  =  0.95  is  the  correlation  level  of  the  redundant 

feature,  and  =  /?  *  p^^^  is  the  correlation  level  induced  by  p  and  p^^^ .  is  the 

correlation  matrix  between  the  features  contained  in  feature  set  j  and  feature  set  k  in  class 
i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj“be  the  features  from  feature  set  1  in  class  0  and  F/  be 
the  features  from  feature  set  1  in  class  1  where  Fj  =  u  F/  .  Let  p\  be  the  mean  of 
feature  set  1  in  class  0  and  p\  be  the  mean  of  feature  set  1  in  class  1 .  Let 
Fi°  ~  and  F/  ~  where  //°  =  (0,0,0,0)^  and 

p\  =(0.50,0.50,0.50,0)^.  Let  =  F^  ^F^  where  F^  ~  and 

Fj  ^  N{p\,J.\,F2)  where  pi  =  (0,0,0,0)^  and  pi  =(0.75,0.75,0.75,0)^. 


Problem  3:  8  Feature  with  Autocorrelation  Case 


O 


Let  F  =  Fj  X  F2  c;  F*  where  F,  cz  is  the  feature  set  observed  by  sensor  1,  a 
linear  discriminant  function,  and  Fj  c  F"*  is  the  feature  set  observed  by  sensor  2,  a 
quadratic  discriminant  function.  The  correlation  of  the  data  is  given  by 


E'  = 


y  1  y  ; 

yi  yi 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  1 
redundant  feature,  and  1  noise  feature  (same  mean). 


28 


1  0 

0  1 

0  Pred 

0  0 


0  0 

Pred  0 

1  0 

0  1 


Also, 


S 


0  P  Pm  0 

p  0  0  0 

Pm  0  0  0 

0  0  0  0 


where  p  e  {0. 0,0. 2, 0.4, 0.6, 0.8,0. 9} ,  =  0.95  is  the  correlation  level  of  the  redundant 

feature,  and  p^^^  =  /?  *  p^^^  is  the  correlation  level  induced  by  p  and  p^^^ .  is  the 
correlation  matrix  between  the  features  contained  in  feature  set  j  and  feature  set  k  in  class 
i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj°be  the  features  from  feature  set  1  in  class  0  and  F/  be 
the  features  from  feature  set  1  in  class  1  where  Fj  =  Fj°  u  F/  .  Let  // °  be  the  mean  of 
feature  set  1  in  class  0  and  p\  be  the  mean  of  feature  set  1  in  class  1 .  Let 
F|°  ~  ^  )  and  F/  ~  where  pi  =  (0,0,0,0)^  and 

p\  =(0.95,0.95,0.95,0)^  Let  ^F^  where  F^  ~  N{pIXf„f,)  and 

Fj  N{p\,Y}p^P^)  where  pi  =  (0,0,0,0)^  and  p\  =(1.15,1.15,1.15,0)^.  This  adds  the 
appropriate  level  of  correlation  between  feature  sets.  In  addition,  p^^^^  e  {0.0,0. 5,0. 9}  is 
the  level  of  autocorrelation  within  a  feature  set.  Let  z{t)  c  e  (1,2,...,  A}  where  N  is 
the  number  of  training  exemplars,  be  one  exemplar  in  the  feature  space  where 
z{t)  ~  A(0,E°) ;  it  is  one  row  of  the  matrix  of  features  described  above.  Let 

^  =  Panto  I ,  B  =  *  I  -  and  s{t)  ~  A(0,  (F  *  *  B))  for  each  t.  Then, 


29 


z{t)  =  z{t  -\)  +  s{t)  (Laine,  2003).  Once  the  appropriate  number  of  exemplars  has 
been  generated,  the  means  can  be  added  to  the  corresponding  classes. 

Problem  4:  8  Feature  Triangle  Case 

Every  problem  up  to  this  point  is  a  fairly  simple  problem  separating  two 
multivariate  normal  populations.  The  Triangle  problem  is  a  slightly  more  complieated 
problem  building  toward  the  XOR  problem.  It  is  interesting  to  see  how  eaeh  of  the 
fusion  methods  will  perform  in  the  faee  of  this  more  complicated  problem.  Each  class 
will  contain  two  multivariate  populations;  thus,  four  multivariate  populations  will  be 
generated.  Two  will  be  assigned  to  one  class  and  two  to  the  other  class.  All  four 
multivariate  distributions  will  have  the  same  eovarianee  strueture.  Eet 
F  =  Fj  X  ^2  c  F*  where  Fj  cz  is  the  feature  set  observed  by  sensor  1,  a  linear 
diseriminant  function,  and  Fj  c  F"*  is  the  feature  set  observed  by  sensor  2,  a  quadratie 
diseriminant  function.  The  correlation  of  the  data  is  given  by 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  1 
redundant  feature,  and  1  noise  feature  (same  mean). 

“1  0  0  0“ 

v'  =y'  =  ^  ^  ^ 

0  1  0  ■ 

0  0  0  1 

Also, 


30 


=  2' 


0  P  Pm  0 

p  0  0  0 

Pm  0  0  0 

0  0  0  0 


where  p  e  {0. 0,0. 2, 0.4, 0.6, 0.8,0. 9} ,  =  0.95  is  the  correlation  level  of  the  redundant 

feature,  and  =  /?  *  p^^^  is  the  correlation  level  induced  by  p  and  p^^^ .  is  the 

correlation  matrix  between  the  features  contained  in  feature  set  j  and  feature  set  k  in  class 
i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj°'be  the  first  set  of  features  from  feature  set  1  in  class  0, 


be  the  second  set  of  features  from  feature  set  1  in  class  0,  F/‘  be  the  first  set  of 


features  from  feature  set  1  in  class  1,  and  F.'^be  the  second  set  of  features  from  feature 


set  1  in  class  1  where  Fj  =  Fj  u  Fj  ,  F/  =  F/  u  Ff  and  Fj  =  Fj  u  F{ .  Let  be 


the  mean  of  the  first  set  of  features  in  feature  set  1  in  class  0,  pf'  be  the  mean  of  the 


second  set  of  features  in  feature  set  1  in  class  0,  //j '  be  the  mean  of  the  first  set  of 


features  in  feature  set  1  in  class  1 ,  and  ^  be  the  mean  of  the  second  set  of  features  in 


feature  set  1  in  class  1.  Let  Fj°'  ~  ^  ),  ~  lF{^pf'  ^  ) 


F,"  ~  and  F/^  ~  N{pl\T.\^p^)  where  p^  =(0,0,0,0)^ , 


U2 


12  -^1 


01 


//“=  (0.95,0.95,0.95,0)^  =(0.95,0,0,0)^  and  =(0.95,0,0,0) 


\T  .11 


12 


Let  F^  =  Fj®'  u  Fj®"  ,  Fj  =  Fj"  u  Fj " ,  and  Fj  =  Fj®  u  Fj 


where  Fj®'  ~  N{pl\T.\p^),  F^^  ~  N{pTXp^^p^),  Fj“  ~ 


01  ^0 


-02 


02  ^0 


11  V'  1 


Fj'"  ~iV(//j",S),  J,)  where  //j®'  =(0,0,0,0)"  ,  //j®"  =(1.15,1.15,1.15,0)" , 


.12  ^1 


.01 


T  ..02 


=(1.15,0A0)\and  =(1.15,0,0,0)'  . 


31 


Problems:  8  Feature  XOR  Case 


•  O 

O# 

Every  problem  up  to  this  point  is  a  fairly  simple  problem  separating  two 
multivariate  normal  populations.  The  XOR  problem  is  a  more  eomplieated  problem.  It  is 
interesting  to  see  how  each  of  the  fusion  methods  will  perform  in  the  face  of  this  more 
complicated  problem.  Each  class  will  contain  two  multivariate  populations;  thus,  four 
multivariate  populations  will  be  generated.  Two  will  be  assigned  to  one  class  and  two  to 
the  other  class.  All  four  multivariate  distributions  will  have  the  same  covariance 
structure.  Eet  F  =  a  R*  where  Fj  c;  i?"*  is  the  feature  set  observed  by  sensor  1,  a 

linear  discriminant  function,  and  Fj  c;  F"*  is  the  feature  set  observed  by  sensor  2,  a 
quadratic  discriminant  function.  The  correlation  of  the  data  is  given  by 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  1 
redundant  feature,  and  1  noise  feature  (same  mean). 

“1  0  0  0“ 

y'  =y'  =  ^  ^  ^ 

0  1  0  ■ 

0  0  0  1 

Also, 

0  P  Pm  0 

p  0  0  0 

0  0  0 
0  0  0  0 


32 


where  p  e  {0.0, 0.2, 0.4, 0.6, 0.8, 0.9} ,  =  0.95  is  the  eorrelation  level  of  the  redundant 

feature,  and  p.^^^  =  p*  p^^j  is  the  eorrelation  level  indueed  by  p  and  p^^^  .  is  the 

correlation  matrix  between  the  features  contained  in  feature  set  j  and  feature  set  k  in  class 
i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj°'be  the  first  set  of  features  from  feature  set  1  in  class  0, 
be  the  second  set  of  features  from  feature  set  1  in  class  0,  F/'  be  the  first  set  of 
features  from  feature  set  1  in  class  1,  and  F.'^be  the  second  set  of  features  from  feature 


set  1  in  class  1  where Fj"  =  F;"*  u  F;"" ,  F/  =  F/‘  u  F/"  and  F,  =  Fj"  u  F/ .  Let  be 


the  mean  of  the  first  set  of  features  in  feature  set  1  in  class  0,  be  the  mean  of  the 
second  set  of  features  in  feature  set  1  in  class  0,  p\^  be  the  mean  of  the  first  set  of 
features  in  feature  set  1  in  class  1 ,  and  be  the  mean  of  the  second  set  of  features  in 


feature  set  1  in  class  1.  Let  Fy^  ~  ^  ),  Fy^  ~  ^  ) 


01  .^0 


7  02 


02  .^0 


Fj"  ~  and  F/^  ~  where  =(0,0,0,0)", 


U2 


12  .r.! 


.01 


//i  =  (0.95,0.95,0.95,0)'  ,Py  =(0,0.95,0.95,0)'  and  py  =(0.95,0,0,0)'  . 


Let  F^  =  F,®'  u  F,®" ,  F*  =  F"  u  F^" ,  and  F,  =  F,®  u  F* 


where  F^^  ~  N{pl\T.\,^),  F,"'  ~  F‘‘  ~  and 


.01  ^0 


-02 


02  ^0 


.11  v"  1 


F^^  ^  N{p[\I.p  p)  where  /z"*  =(0,0,0,0)'  ,  =(1.15,1.15,1.15,0)' , 


//"  =(0,L15,L15,0)^and  pf  =  (L15,0,0,0)^ 


Problem  6:  8  Feature  XOR  with  Autocorrelation  Case 


O 


O 


This  problem  adds  autocorrelation  to  the  8  feature  XOR  Problem.  Again,  each 
class  will  contain  two  multivariate  populations;  thus,  four  multivariate  populations  will 


33 


be  generated.  Two  will  be  assigned  to  one  class  and  two  to  the  other  class.  All  four 
multivariate  distributions  will  have  the  same  covariance  structure.  Let 
F  =  Fj  X  Fj  c  F*  where  Fj  c;  is  the  feature  set  observed  by  sensor  1,  a  linear 
discriminant  function,  and  Fj  c  F"^is  the  feature  set  observed  by  sensor  2,  a  quadratic 
discriminant  function.  The  correlation  of  the  data  is  given  by 

yi  yi 

V'-  _  FuFl 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  1 
redundant  feature,  and  1  noise  feature  (same  mean). 


1  0 

0  1 

0  Pred 

0  0 


0  0 

Pred  0 

1  0 

0  1 


Also, 


S 


i 


0  P  Pm  0 

p  0  0  0 

Pm  0  0  0 

0  0  0  0 


where  /?  e  (0.0,0. 2,0. 4,0. 6,0. 8, 0.9},  =0.95  is  the  correlation  level  of  the  redundant 


feature,  and  =  /?  *  is  the  correlation  level  induced  by  p  and  p^^^ .  2),  is  the 

correlation  matrix  between  the  features  contained  in  feature  set)  and  feature  set  k  in  class 
i  (i  =  0,1;  j,  k  =  1,2).  Now,  let  Fj°'be  the  first  set  of  features  from  feature  set  1  in  class  0, 
Fj“^  be  the  second  set  of  features  from  feature  set  1  in  class  0,  F/'  be  the  first  set  of 
features  from  feature  set  1  in  class  1 ,  and  F/^  be  the  second  set  of  features  from  feature 


34 


set  1  in  class  1  where Fj  =  u  Fj " ,  F/  =  F/*  u  F/"  and  F;  =  F;  u  F/ .  Let  //j  be 


the  mean  of  the  first  set  of  features  in  feature  set  1  in  class  0,  be  the  mean  of  the 


second  set  of  features  in  feature  set  1  in  class  0,  be  the  mean  of  the  first  set  of 
features  in  feature  set  1  in  class  1 ,  and  ^  be  the  mean  of  the  second  set  of  features  in 


feature  set  1  in  class  1.  Let  Fj°'  ~  ^  ),  Fj"^  ~  ^  ) 


01  -^0 


-02 


02-^0 


Fj"  ~  and  F/^  ~  )  where  =(0,0, 0,0)", 


-12 


12  -t-l 


.01 


//i  =  (0.95,0.95,0.95,0)^  =(0,0.95,0.95,0)^  and  =(0.95,0,0,0)'  . 


Let  F,"  =  F,®'  u  F,®" ,  F*  =  F' ‘  u  ,  and  F,  =  F,"  u  F* 


where  F^^  ~  F,""  ~  F‘‘  ~  F(//)‘,S)._^^_),  and 


.01  ^0 


-02 


02  ^0 


.11  v"  1 


Fj*"  ~F(//2  where  /z"*  =(0,0,0,0)'  ,  =(1.15,1.15,1.15,0)' , 


/z)‘  =  (0,1.15,1.15,0)' ,  and  =  (1.15,0,0,0)'  .  This  adds  the  appropriate  level  of 


correlation  between  feature  sets.  In  addition,  e  {0.0,0. 5, 0.9}  is  the  level  of 

autocorrelation  within  a  feature  set.  Let  ZQ^{t)  ^  (1,2,..., F}  where  N is  the 

number  of  training  exemplars,  be  one  exemplar  in  the  feature  space  for  the  first  set  of 
features  in  class  0  where  Zg,  (t)  ~  F(0,S“)  for  all  t;  it  is  one  row  of  the  matrix  of  features 
described  above.  Let  Zq2(0  c  e  (1,2,..., F}  where  N  is  the  number  of  training 
exemplars,  be  one  exemplar  in  the  feature  space  for  the  second  set  of  features  in  class  0 
where  (0  ~  F(0,E“)  for  all  t;  it  is  one  row  of  the  matrix  of  features  described  above. 
Let  Zjj(t)  d  R^,t  e  (1,2,..., F}  where  N  is  the  number  of  training  exemplars,  be  one 


exemplar  in  the  feature  space  for  the  first  set  of  features  in  class  1  where 


35 


z^^{t)  ~  A^(0,S“)  for  all  t;  it  is  one  row  of  the  matrix  of  features  deseribed  above.  Let 
Zj2(t)  c;  e  {1,2,...,  where  N  is  the  number  of  training  exemplars,  be  one  exemplar 
in  the  feature  space  for  the  second  set  of  features  in  class  1  where  Zy^it)  ~  A^(0,E°)for  all 
t;  it  is  one  row  of  the  matrix  of  features  described  above.  Let  A  =  *  /  , 

B  =  (V1-pL)  *  ^ .  ^01  (0  ~  A^(0,  (5  *  *  5)) ,  ^02  (0  ~  A^(0,  (5  *  *  5))  , 

Sy y  (0  ~  A^(0,  (5  *  E “  *  5)) ,  and  Sy^  (t)  ~  A^(0,  Then, 

Zoi(0  =  ^*Zoi(^-l)  +  ^Ol(0,  Zo2(0  =  ^*^02(^-1)  +  ^02(0  ,  1  (0  =  ^  *  ^1 1  “  1)  +  ^1 1  (0 

,  and  Zj2  (t)  =  A*  Zj2  (t  - 1)  +  Sy2  (t)  (Laine,  2003).  Once  the  appropriate  number  of 
exemplars  has  been  generated,  the  means  can  be  added  to  the  corresponding  populations, 
and  the  populations  can  be  grouped  together  into  the  appropriate  classes.  Class  0  is 
composed  of  Zoj(t)and  ZgjCO  ;  Class  1  is  composed  of  Zjj(t)and  Zy^it) . 


Problem  7:  20  Feature  with  Feature  Selection  Case 


Let  F  =  FyxF^  (z  where  Fy  c  is  the  feature  set  observed  by  sensor  1  and 


,10 


F^  (z.  R  is  the  feature  set  observed  by  sensor  2.  The  correlation  of  the  data  is  given  by 


E'  = 


2) 


F,,F,  ^F.Fj 


E' 


Fi,F,  ^F2,F^ 


In  this  case,  each  feature  set  will  contain  2  independent  features  (separated  in  mean),  4 
redundant  features,  and  4  noise  features  (same  mean). 


36 


1 

0 

Predl 

Predl 

0 

0 

!  0 

0 

0 

o' 

0 

1 

0 

0 

Predl 

PredA 

!  0 

0 

0 

0 

Pred\ 

0 

1 

0 

0 

0 

0 

0 

0 

0 

Predl 

0 

0 

1 

0 

0 

1  0 

1 

0 

0 

0 

0 

PredT, 

0 

0 

1 

0 

!  0 

1 

0 

0 

0 

0 

PredA 

0 

0 

0 

1 

!  0 

l_ 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

Also, 


0 

p 

0 

0 

Pindl 

PindA 

!  0 

0 

0 

0 

p 

0 

Pindl 

Pindl 

0 

0 

!  0 

0 

0 

0 

0 

P  ind  1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

Pindl 

0 

0 

0 

0 

1  0 

1 

0 

0 

0 

P  ind  3 

0 

0 

0 

0 

0 

!  0 

1 

0 

0 

0 

PindA 

0 

0 

0 

0 

0 

!  0 

_L 

0 

0 

0 

0 

0 

0 

0 

0 

0 

!  0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

where  p  e  {0.0, 0.2, 0.4, 0.6, 0.8, 0.9}, 


=  0-60,P,,^2  =0.50,p^^^3  =  0.40,  =0.30, 


are  the  eorrelation  levels  of  the  redundant  features,  and  =  p*  are  the 
eorrelation  levels  induced  by  p  and  p^^^  for  all  m=l, 2,3,4.  2},  is  the  correlation 

matrix  between  the  features  contained  in  feature  set  j  and  feature  set  k  in  class  i  (i  =  0, 1 ;  j , 
k  =  1,2).  Now,  let  Fj°be  the  features  from  feature  set  1  in  class  0  and  F/  be  the  features 
from  feature  set  1  in  class  1  where  Fj  =  u  F/  .  Let  p\  be  the  mean  of  feature  set  1  in 
class  0  and  p\  be  the  mean  of  feature  set  1  in  class  1.  Let  ~  )  and 


37 


Fj'  ~  ^ )  where  =  (0,0, 0,0, 0,0, 0,0, 0,0)^  and 

//j  =  (0.5,0. 5, 0.5, 0.5,0. 5,0. 5,0,0,0,0)^ .  Let  uFj  where  F^  ~  N(jU2  and 

Fj  N{iu\,Y}p^P^)  where  //“  =  (0,0,0,0,0,0,0,0,0,0)^  and 

/u\  =  (0.75,0.75,0.75,0.75,0.75,0.75,0,0,0,0)^  .  Two  different  sample  sizes  were  used  for 
this  problem;  50  exemplars  in  eaeh  elass  and  1000  exemplars  in  eaeh  elass.  Three  data 
sets  were  generated  for  eaeh  sample  size  and  used  as  shown  in  the  proeess  flow  diagrams 
above. 

In  this  problem,  after  the  data  was  generated  as  deseribed  above,  feature  seleetion 
was  performed.  To  perform  feature  seleetion,  diseriminant  loadings  were  ealeulated  for 
eaeh  feature.  In  this  projeet,  the  loading  is  defined  to  be  the  eorrelation  between  the 
feature  and  the  posterior  probability  of  being  in  elass  1.  After  the  loadings  were 
ealeulated,  any  feature  with  a  loading  greater  than  0.45  was  kept  as  a  good  feature.  In  the 
small  sample  size  problem,  there  were  eases  where  none  of  the  loadings  were  larger  than 
0.45.  In  these  eases,  only  the  feature  with  the  largest  loading  was  eonsidered  a  good 
feature  and  kept  for  the  remainder  of  the  analysis.  Then,  diseriminant  analysis  and 
elassifier  fusion  were  redone  using  only  those  good  features.  For  eomparison,  analysis 
was  also  done  without  feature  seleetion  where  all  the  features  were  kept  and  used  for  the 
diseriminant  analysis  and  sensor  fusion.  These  two  results  were  eompared.  This  proeess 
was  eompleted  for  eaeh  of  the  six  eorrelation  levels,  eaeh  of  the  two  sample  sizes  (50 
exemplars  in  eaeh  elass  and  1000  exemplars  in  eaeh  elass),  with  and  without  feature 
seleetion,  over  15  runs;  there  were  a  total  of  360  runs  in  this  first  experiment. 


38 


Problem  8:  36  Feature  Case  with  Feature  Selection  Case 


Let  F  =  Fj  X  Fj  cz.  where  F^  c:  is  the  feature  set  observed  by  sensor  1  and 


Fj  <zR  is  the  feature  set  observed  by  sensor  2.  The  eorrelation  of  the  data  is  given  by 


S'  = 


S', 


F,,F,  ^F,,F, 


S', 


In  this  ease,  eaeh  feature  set  will  eontain  2  independent  features  (separated  in  mean),  8 
redundant  features,  and  8  noise  features  (same  mean). 


where 


M,,  = 


1 

0 

Pred\ 
Predl 
Predh 
P  red  A 

0 

0 

0 

0 


Also,  S'  =S^ 


Mn  0 
0  / 


0 

1 

0 

0 

0 

0 

P  reds 

P  red^ 

Predl 
P  red% 


0 

1 

0 

0 

0 

0 

0 

0 

0 


Predl 

0 

0 

1 

0 

0 

0 

0 

0 

0 


Predl 

0 

0 

0 

1 

0 

0 

0 

0 

0 


PredA  ^ 

0  PredS 

0  0 

0  0 

0  0 

1  0 

0  1 

0  0 

0  0 

0  0 


Mj2  0 

0  0 


0 

P  red  6 

0 

0 

0 

0 

0 

1 

0 

0 


0 

Predl 

0 

0 

0 

0 

0 

0 

1 

0 


0 

P  red% 

0 

0 

0 

0 

0 

0 

0 

1 


39 


0 


Mn 


1  0  0 

0  1  Awi 

0  1 
0  Pm2  0 

0  P,„,3  0 

0  Pm4  0 

PindS  0  0 

P.nd6  0  0 

Aw7  0  0 

Aw8  0  0 


0 

Pindl 

0 

1 

0 

0 

0 

0 

0 

0 


P  ind  3 

0 

0 

1 

0 

0 

0 

0 

0 


0 

PindA 

0 

0 

0 

1 

0 

0 

0 

0 


P  ind  5  P  ind  6  P  ind  7 

0  0  0 

0  0  0 

0  0  0 

0  0  0 

0  0  0 

1  0  0 

0  1  0 

0  0  1 

0  0  0 


Pind% 

0 

0 

0 

0 

0 

0 

0 

0 

1 


where  /?  =  0.8, 

Pred,  =  PredS  =  ^'^^^Predl  =  P red(,  =  ^'^^^Pred^  =  P red!  =  ^'^^^PredA  =  P redi  =  0-20,  are  the 
eorrelation  levels  of  the  redundant  features,  and  =  p*  are  the  correlation 

levels  induced  by  p  and  p^^^  for  all  m=l,. .  .,8.  is  the  correlation  matrix  between 

the  features  contained  in  feature  set  j  and  feature  set  k  in  class  i  (i  =  0,1;  j,  k  =  1,2).  Now, 
let  Fj°be  the  features  from  feature  set  1  in  class  0  and  F/  be  the  features  from  feature  set 
1  in  class  1  where  Fj  =  Fj°  u  F/  .  Let  pi  be  the  mean  of  feature  set  1  in  class  0  and  p\ 
be  the  mean  of  feature  set  1  in  class  1.  Let  Fj°  ~  )  and  F/  ~  N{^p\,Y}p^p^ ) 


where  pi  =  (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)^  and 


p\  =  (0.5,0. 5,0. 5, 0.5,0. 5, 0.5,0. 5, 0.5,0. 5,0.5, 0,0,0,0,0,0,0,0)^  .  Let  Fj  =  Fj®  yjF\  where 


F^"  ~  .  )  and  Fl  ~  F(//‘ ,E‘  .  )  where 


pI  =  (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)^  and 


p\  =(0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0.75,0,0,0,0,0,0,0,0)^  Only  the  low 


40 


sample  size  was  used  for  this  experiment.  Three  data  sets  were  generated  and  used  as 
shown  in  the  process  flow  diagram  in  Figure  1. 

In  this  problem,  the  feature  set  was  expanded  for  a  more  thorough  examination  of 
feature  selection  on  a  single  run  for  one  sample  size  (50  exemplars  in  each  class)  with 
one  level  of  correlation  (p  =  0.8)  of  the  same  process.  In  this  case,  data  was  generated  as 
described  above,  and  discriminant  loadings  were  calculated  in  the  same  manner  as  in  the 
first  feature  selection  problem.  In  this  problem,  the  classification  accuracy  is  defined  to 
be  the  sum  of  the  true  positive  values  and  true  negative  values  (the  sum  of  the  correct 
classifications).  The  number  of  features  was  reduced  from  all  18  features  for  each 
classifier  to  1  feature  for  each  classifier.  For  each  number  of  features,  the  classification 
accuracy  was  calculated  for  each  fusion  method. 

Experimental  Design 

In  the  data  generation,  three  data  sets  were  generated.  For  all  of  the  problems 
described  in  Table  4,  the  same  general  experimental  design  was  followed.  The  only 
difference  was  in  the  data  generation  phase  of  the  process.  The  first  data  set  was  used  to 
train  the  individual  classifiers,  the  linear  and  quadratic  discriminant  functions.  Once  the 
individual  classifiers  were  trained,  the  second  data  set  was  used  to  validate  the  individual 
classifiers.  Posterior  probabilities  were  calculated  from  the  second  and  third  data  sets  for 
later  use.  The  second  data  set  posterior  probabilities,  in  addition  to  being  validation  data 
for  the  individual  classifiers,  were  used  to  train  the  fusion  methods.  The  posterior 
probabilities  from  the  third  data  set  were  used  to  validate  the  fusion  methods.  All  of  the 
plots  generated  are  results  from  the  third  data  set. 


41 


ISOC  Application 

The  ISOC  Fusion  model  takes  a  given  threshold,  0.5,  and  determines  the  optimal 
fusion  rule.  Using  the  methodology  described  in  the  Literature  Review,  the  optimal  rule 
was  calculated  from  the  posterior  probabilities  from  the  second  data  set.  After  the 
optimal  rule  was  calculated,  the  thresholds  for  both  individual  classifiers  were  varied 
together  from  0.0  to  1.0,  and  the  optimal  rule  was  applied  on  the  independent  posterior 
probabilities  from  the  third  data  set.  The  false  positive  values  were  plotted  against  the 
true  positive  values  as  the  thresholds  were  varied;  the  result  is  six  ISOC  curves,  one  for 
each  level  of  correlation.  This  process  was  replicated,  and  the  average  ISOC  curve  from 
the  replications  was  calculated. 

ROC  “Within”  OR  Application 

The  ROC  “Within”  Fusion  model  takes  a  given  rule,  the  “Logical  OR”  rule,  and 
determines  the  optimal  thresholds  for  each  individual  classifier.  Using  the  methodology 
described  in  the  Literature  Review,  the  optimal  threshold  pairs  were  calculated  from  the 
posterior  probabilities  from  the  second  data  set.  After  the  optimal  threshold  pairs  were 
calculated,  the  optimal  threshold  pairs  along  with  the  “Logical  OR”  rule  were  applied  to 
the  independent  posterior  probabilities  from  the  third  data  set.  The  false  positive  values 
were  plotted  against  the  true  positive  values  as  the  thresholds  were  varied;  the  result  is  six 
ROC  curves,  one  for  each  level  of  correlation.  This  process  was  replicated,  and  the 
average  ROC  curve  from  the  replications  was  calculated. 

PNN  Application 

The  PNN  Fusion  model  treats  the  posterior  probabilities  from  the  individual 
classifiers  as  features  and  outputs  an  overall  posterior  probability  of  an  exemplar  being  in 


42 


a  given  class  using  the  methodology  described  in  the  Literature  Review.  Unlike  the 
ISOC  Fusion  model  and  the  ROC  “Within”  Fusion  model,  the  PNN  only  observes  the 
posterior  probabilities  from  the  third  data  set.  The  PNN  is  trained  on  the  first  1/3  of  the 
data  set  and  validated  on  the  last  2/3  of  the  data  set.  On  the  validation  set,  the  threshold 
was  varied  from  0.0  to  1.0.  The  false  positive  values  were  plotted  against  the  true 
positive  values  as  the  thresholds  were  varied;  the  result  is  six  ROC  curves,  one  for  each 
level  of  correlation.  This  process  was  replicated,  and  the  average  ROC  curve  from  the 
replications  was  calculated. 

One  Big  Network  Application 

The  One  Big  Network  model  eliminates  the  individual  classifiers  and  takes  all 
features  as  inputs  to  a  generalized  regression  neural  network  using  the  methodology 
described  in  the  Literature  Review  Section  (reference  actual  section  number).  The  false 
positive  values  were  plotted  against  the  true  positive  values  as  the  thresholds  were  varied; 
the  result  is  six  ROC  curves,  one  for  each  level  of  correlation.  This  process  was 
replicated,  and  the  average  ROC  curve  from  the  replications  was  calculated. 

Feature  Selection 

In  problems  where  each  classifier  observes  many  features,  many  of  which  are 
noise  or  redundant  features,  it  may  be  beneficial  to  select  only  those  features  that  are 
relevant  for  classification.  This  prevents  the  classifier  from  being  confused  by  those 
features  that  add  little  to  classification  accuracy. 

Total  Probability  of  Misclassification 

The  total  probability  of  misclassification  (TPM)  is  calculated  by  summing  the  two 
error  probabilities:  probability  of  false  positive  and  probability  of  false  negative.  This 


43 


calculation  can  be  made  for  both  of  the  classifiers  used  in  this  thesis,  the  linear 
diseriminant  funetion  and  the  quadratie  diseriminant  funetion,  by  using  only  those 
features  that  the  classifier  actually  observes.  This  calculation  will  give  a  prior  estimation 
of  the  errors  to  be  observed  by  the  classifiers.  Another  approaeh  is  to  caleulate  an  overall 
TPM,  as  if  all  the  features  were  observed  by  one  big  classifier.  This  calculation  will  also 
give  a  prior  estimation  of  the  errors  to  be  observed  by  the  fusion.  As  the  eorrelation 
between  feature  sets  increases,  less  independent  information  is  presented  to  the 
elassifiers.  Thus,  as  the  eorrelation  between  feature  sets  inereases,  the  TPM  is  expeeted 
to  increase;  as  less  independent  information  is  presented  to  the  classifiers,  more  errors  are 
expeeted. 

Sample  Size  Variation 

Another  experiment  was  designed  to  examine  the  effects  of  sample  size.  For  this 
experiment,  the  sample  size  is  defined  to  be  the  number  of  exemplars  in  each  class  in 
each  data  set.  For  eaeh  sample  size,  the  above  fusion  methods  were  performed,  and  the 
resulting  curves  were  generated.  It  is  realistic  that  the  actual  amount  of  correlation 
present  in  the  data  will  not  be  known  ahead  of  time.  It  also  may  be  realistie  to  assume 
that  there  is  equally  likely  probability  of  observing  eaeh  of  the  six  levels  of  eorrelation. 
Therefore,  the  six  ROC  eurves  generated  for  eaeh  method,  one  for  each  level  of 
correlation,  were  averaged  into  one  ROC  curve  for  each  sample  size.  Then,  the  average 
ROC  curve  for  eaeh  of  the  sample  sizes  was  plotted  to  see  the  effeets  of  sample  size  on 
the  fusion  proeess. 


44 


Chapter  Summary 

This  chapter  described  the  methodology  employed  throughout  this  thesis.  First, 
the  two  types  of  correlation  used  in  this  thesis  were  presented.  Then,  the  data  generation 
process  for  each  of  the  problems  was  provided.  Next,  the  overall  experimental  design,  as 
well  as  some  general  application  issues  for  each  of  the  four  fusion  methods,  was  given. 
Finally,  feature  seleetion,  TPM,  and  sample  size  variation  were  discussed. 


45 


IV.  Findings  and  Analysis 


Introduction 

After  the  data  was  generated,  the  four  fusion  methods  were  performed  and  results 
were  generated  and  analyzed  for  eaeh  of  the  problems  deseribed  in  the  data  generation 
seetion.  This  seetion  provides  findings  and  analysis  for  eaeh  of  the  problems  deseribed 
above.  Table  5  summarizes  how  the  results  are  presented.  All  results  are  average  results 
from  5  replieations. 


Table  5:  Results  Descriptions. 


Problem  # 

Problem  Name 

Results  Description 

1 

• 

O 

4  Feature  Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

2 

8  Feature  Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

3 

• 

o 

8  Feature  with  Autoeorrelation 

Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

4 

o 

om 

8  Feature  Triangle  Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

•  o 
o  • 

8  Feature  XOR  Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

6 

a 

8  Feature  XOR  with 

Autoeorrelation  Case 

ROC  curves,  N=1000 

ROC  curves.  Across  Sample  Sizes 

7 

o* 

20  Feature  with  Feature  Seleetion 
Case 

ROC  curves,  N=50  N=1000 

Feature  Selection  vs  Non-Feature  Selection 

8 

o* 

36  Feature  with  Feature  Seleetion 
Case 

Classification  Accuracy,  N=50,  rho=0.8 

9 

• 

o 

TPM  Exploration 

ROC  Curves,  N=50,  N=1000 

3  specific  levels  of  correlation 

Problem  1  Results:  4  Feature  Case,  Single  Sample  Size 


Features  were  generated  aeeording  to  the  methodology  deseribed  in  the  Data 


Generation:  4  feature  without  autoeorrelation,  and  the  fusion  proeess  was  followed  as 


46 


described  in  the  Experimental  Design  section.  Figure  8  shows  four  plots,  one  for  each  of 
the  four  fusion  methods.  These  are  average  ROC  curves  over  five  replications  with  1000 
exemplars  in  each  class.  Each  plot  contains  six  ROC  curves,  one  for  each  of  the  six 
levels  of  correlation.  In  addition,  crosshairs  are  place  at  the  point  (0.1,  0.6)  to  add  a  point 


of  reference  common  for  all  four  plots. 


ISOC  Fusion  Correlation  Comparison  N=1000  ROC  "Within"  Fusion  Correlation  Comparison  N=1000 


PNN  Fusion  Correlation  Comparison  N=1000  One  Big  Network  Fusion  Correlation  Comparison  N=1000 


Figure  8:  4  Feature  ROC  Curves,  N=1000, 

From  these  plots,  some  meaningful  conclusions  can  be  made.  First,  ISOC  Fusion 
and  ROC  “Within”  Fusion  appear  to  be  very  robust  to  correlation;  however,  they  are  on 
the  low  end  of  performance.  On  the  other  hand,  the  PNN  is  not  as  robust  to  correlation  as 
the  first  two  methods;  that  is,  performance  varies  depending  on  the  level  of  across 
correlation.  However,  the  simplistic  PNN  performs  as  well  as  ISOC  and  ROC  “Within” 


47 


at  high  levels  of  eorrelation,  and  it  drastieally  outperforms  them  at  low  levels  of 
eorrelation.  Again,  the  PNN  observes  only  half  of  the  data  that  ISOC,  ROC  “Within”, 
and  OBN  observe.  The  OBN  approaeh  performs  eomparably  to  the  PNN. 

Problem  1  Results:  4  Feature  Case,  Varying  Sample  Size 

Features  were  generated  aeeording  to  the  methodology  deseribed  in  the  Data 
Generation:  4  feature  without  autoeorrelation,  and  the  fusion  proeess  was  followed  as 
deseribed  in  the  Experimental  Design  seetion.  This  proeess  was  repeated  for  multiple 
sample  sizes.  Figure  9  shows  four  plots,  one  for  eaeh  of  the  four  fusion  methods.  They 
are  the  average  ROC  eurves  over  the  six  levels  of  eorrelation  for  a  given  sample  size. 
Eaeh  plot  eontains  six  ROC  eurves,  one  for  eaeh  of  the  six  sample  sizes.  In  addition, 
erosshairs  are  plaee  at  the  point  (0.1,  0.6)  to  add  a  point  of  referenee  eommon  for  all  four 
plots. 


48 


ISOC  Fusion  Sample  Size  Comparison  ROC  "Within"  Fusion  Sampie  Size  Comparison 


PNN  Fusion  Sampie  Size  Comparison 


One  Big  Network  Fusion  Sampie  Size  Comparison 


Figure  9:  4  Feature  ROC  Curves,  Across  Sample  Sizes, 

From  these  plots,  it  is  obvious  that  in  this  simple  problem,  sample  size  is  not 
much  of  a  factor.  For  each  of  the  4  methods,  the  ROC  curves  for  a  sample  size  of  25 
resulted  in  different  ROC  curves  than  the  ROC  curves  for  a  sample  size  of  50.  After  a 
sample  size  of  50  is  obtained,  little  increase  in  performance  is  gained  as  the  sample  size  is 
increased  to  2000.  ISOC  and  OBN  appear  to  be  the  most  robust  to  sample  size,  and  the 
ROC  curves  for  the  PNN  appear  to  be  better  at  all  sample  sizes. 


Problem  2  Results:  8  Feature  Case,  Single  Sample  Size 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  without  autocorrelation,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  Figure  10  shows  four  plots,  one  for  each 


49 


of  the  four  fusion  methods.  They  are  average  ROC  eurves  over  five  replieations  with 


1000  exemplars  in  eaeh  elass.  Eaeh  plot  eontains  six  ROC  eurves,  one  for  each  of  the  six 


levels  of  correlation.  In  addition,  crosshairs  are  place  at  the  point  (0.1,  0.4)  to  add  a  point 


of  reference  common  for  all  four  plots. 


ISOC  Fusion  Correlation  Comparison  N=1000 


ROC  "Within"  Fusion  Correlation  Comparison  N=1000 


PNN  Fusion  Correlation  Comparison  N=1000 


One  Big  Network  Fusion  Correlation  Comparison  N=1000 


Figure  10:  8  Feature  ROC  Curves,  N=1000, 

From  the  plots  above,  many  of  the  same  conclusions  can  be  drawn.  First,  as 
described  in  the  data  generation  section,  the  means  in  this  problem  are  closer  together. 
That  is  the  cause  of  the  shift  in  the  ROC  curves;  it  is  not  the  addition  of  the  noise  and 
redundant  features.  Second,  as  in  the  4  feature  problem,  the  ISOC  and  ROC  “Within” 
Fusion  methods  are  the  most  robust,  but  they  are  on  the  low  end  of  performance.  The 


50 


OBN  is  fairly  robust,  but  it  outperforms  the  ISOC  and  ROC  “Within”  at  0.0  level  of 
eorrelation.  The  PNN  does  as  well  as  ISOC,  ROC  “Within,”  and  OBN  at  high  levels  of 
correlation,  and  it  outperforms  them  at  low  levels  of  correlation. 

Problem  2  Results:  8  Feature  Case,  Varying  Sample  Size 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation:  8  feature  without  autocorrelation,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  This  process  was  repeated  for  multiple 
sample  sizes.  Figure  1 1  shows  four  plots,  one  for  each  of  the  four  fusion  methods.  They 
are  the  average  ROC  curves  over  the  six  levels  of  correlation  for  a  given  sample  size. 
Each  plot  contains  six  ROC  curves,  one  for  each  of  the  six  sample  sizes.  In  addition, 
crosshairs  are  place  at  the  point  (0.1,  0.4)  to  add  a  point  of  reference  common  for  all  four 
plots. 


51 


ISOC  Fusion  Sample  Size  Comparison 


ROC  "Within"  Fusion  Sampie  Size  Comparison 


PNN  Fusion  Sampie  Size  Comparison  One  Big  Network  Fusion  Sampie  Size  Comparison 


Figure  11:  8  Feature  ROC  Curves,  Across  Sample  Sizes, 

The  sample  size  effeet  is  more  evident  in  the  8  feature  ease  than  it  was  in  the  4 
feature  case.  There  is  a  fairly  obvious  break  between  the  first  three  sample  sizes  and  the 
second  three  sample  sizes  for  all  four  fusion  methods.  After  a  sample  size  of  250,  there  is 
little  increase  in  performance  by  increasing  the  sample  size  up  to  500  or  1000.  The 
ISOC,  ROC  “Within,”  and  OBN  methods  seem  to  perform  about  the  same  for  all  sample 
sizes.  The  PNN  does  as  well  as  the  other  three  methods  at  the  lower  sample  sizes,  and  it 
outperforms  the  other  three  methods  at  the  higher  sample  sizes. 


52 


Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  Single  Sample  Size 

O 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  with  autocorrelation,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  Figure  12  shows  one  feature  over  time  at 
0.0  level  of  autocorrelation,  and  Figure  13  shows  one  feature  over  time  at  0.9  level  of 
autocorrelation. 

5 

4 

3 

2 

0 

1  1 
> 

2 

TO  0 
0 
Li. 

-1 

-2 

-3 

-4 

0  200  400  600  800  1000  1200  1400  1600  1800  2000 

Time 

Figure  12:  Feature  1  over  Time:  0,0  Level  of  Autocorrelation, 


Feature  1  overtime:  0.0  level  of  autocorrelation 


53 


Q) 

_D 

> 


Feature  1  overtime:  0.9  level  of  autocorrelation 


1 

— 

•  c 

1 

.9  autocorrelation 

•  t 

.  a 

•  0 

•  5 

h 

•  '  • 

* 

• 

•  • 

•  1 

• 

»'  . 

•t 

i  ^  i 
'(  fi  i 

ie*. 

•  *  *  • 

y  ,VV.<- 

» 1.  • 

ir-' 

.>.;Vo 

•  1^.  %  1 

[?»*^  ^  • 

. 

; 

,,  • , 

*.:.W 

u. 

>,(.T 

ft  i 
• 

V*- 

,  T  •»>  • 

L  * — *i  : 

■k'  '5? 

»  , 

^  5  ?  V,.  *  ' 
•.  “•St 

*5  ffc  ;*•  y  * 

*  • 

rV  : 

r.-’ 

*  : 

■fe  ■ 

•  §  ^ 

fits: 

gs-A.- 

i 

I*-  -i - 

• 

• 

•  A 

\  •  1 
;;:-5 

• 

.  * 

•  • 

•  • 

.  •  *  T?  *  : 

•  •  ’•j »  ,> 

-  »1  1  '  •  •  *1 

i 

•  .*  1  ^  : 

1  1  j  ■ 

1  1*1 

_ 

1  1*1 

1  1 

1  1 

1  1 

[ ^ 

_ 

_ 

_ 

_ 

, ^ ^ ^ ^ ^ ^ ^ ^ 

0  200  400  600  800  1000  1200  1400  1600  1800  2000 

Time 


Figure  13:  Feature  1  over  Time:  0,9  Level  of  Autocorrelation, 

Figure  12  shows  that,  as  expected,  feature  1  is  independent  over  time;  this  is 
evident  because  there  is  no  pattern  in  the  data  over  time.  Figure  13  shows  that 
autocorrelation  is  present;  this  is  evident  because  there  is  a  definite  pattern  over  time. 
Thus,  the  appropriate  levels  of  autocorrelation  are  present. 

Figure  14  shows  four  plots,  one  for  each  of  the  four  fusion  methods.  They  are  the 
values  of  true  positive  for  a  false  positive  value  of  0.1  on  the  average  ROC  curves  over 
live  replications  with  1000  exemplars  in  each  class.  The  true  positive  value  is  plotted 
against  the  level  of  across  correlation.  Each  plot  contains  three  lines,  one  for  each  of  the 
three  levels  of  autocorrelation. 


54 


ISOC  TP  for  an  FP=0.1  \s  correlation,  N=1000 


1 

0.8 

o.e 

0.4 

0.2 

0 


1 

1 

— ^  AutoCorr=0.0 
AutoCorr=0.5 
-©-  AutoCorr=0.9 

1 

i 

i - rw 

0.2 


0.4  0.6 

correlation 


0.8 


PNN  TP  for  an  FP=0.1  vs  correlation,  N=1000 


1 

0.6 

0.4 

0.2 

0 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


■Jt' 


0.2 


0.4  0.6 

correlation 


0.8 


ROC  "Within"  TP  for  an  FP=0.1  correlation,  N=1000 

1  - ^ - 


0.8 

0.6i^—  - 
0.4 

0.2 

0 


i  . 

■'0‘- 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


~r 


0.2 


0.4  0.6 

correlation 


0.8 


OBN  TP  for  an  FP=0.1  \s  correlation,  N=1000 


1 

0.8 

0.6 

0.4 

0.2 

0 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


0.2 


0.4  0.6 

correlation 


0.8 


Figure  14:  8  Feature  with  Autocorrelation  Case,  N=1000, 

Figure  14  shows  that  there  is  an  effect  of  autocorrelation;  although,  it  is  not 
dramatic  at  this  high  sample  size.  It  also  seems  to  make  a  more  significant  difference  in 
those  fusion  methods  that  assume  independence  of  the  classifiers,  ISOC  and  ROC 
“Within.”  There  is  less  of  a  difference  in  those  fusion  methods  that  make  no  such 
assumption  about  the  classifiers,  PNN  and  OBN.  Once  again,  at  this  sample  size,  the 
OBN  and  PNN  fusion  far  exceed  the  performance  of  ISOC  and  ROC  “Within”  at  low 
levels  of  correlation  and  perform  at  least  as  well  at  high  levels  of  correlation. 


55 


Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  Across  Sample  Sizes 

O 

Features  were  generated  aeeording  to  the  methodology  deseribed  in  the  Data 
Generation;  8  feature  with  autoeorrelation,  and  the  fusion  proeess  was  followed  as 
deseribed  in  the  Experimental  Design  seetion.  This  proeess  was  repeated  for  multiple 
sample  sizes.  Figure  15  shows  four  plots,  one  for  eaeh  of  the  four  fusion  methods,  They 
are  the  true  positive  values  for  a  false  positive  value  of  0.1  on  the  average  ROC  eurves 
over  the  six  levels  of  eorrelation  for  a  given  sample  size.  Sample  size  is  varied  from  50 
exemplars  in  each  class  to  1000  exemplars  in  each  class.  Each  plot  contains  three  lines, 
one  for  each  of  the  three  levels  of  autocorrelation. 


ISOC  TP  for  an  FP=0.1  vs  sample  size 


D. 

H 

D. 


— ^  AutoCorr=0.0 
AutoCorr=0.5 
-O-  AutoCorr=0.9 


0  200  400  600  800  1000 

sample  size  in  each  class 

PNN  TP  for  an  FP=0.1  \s  sample  size 


1  — 

0.8  -- 
_  0.6 
0.4^' 
0.2  -- 

0 


— ^  AutoCorr=0.0 
AutoCorr=0.5 
-©-  AutoCorr=0.9 


0  200  400  600  800  1000 

sample  size  in  each  class 


ROC  "Within"  TP  for  an  FP=0.1  vs  sample  size 

1  - . - . - . - ^ - 


0.8 

0.6 

0.4 

0.2 


— ^  AutoCorr=0.0 
AutoCorr=0.5 
.j  AutoCorr=0.9 


-r 


-.A“- 

1 


- 


0  200  400  600  800  1000 

sample  size  in  each  class 

OBN  TP  for  an  FP=0.1  \s  sample  size 


1  r 

0.8 

0.6 

0.4  • 

0.2  • 

0 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


--Si 


200  400  600  800  1000 

sample  size  in  each  class 


Figure  15:  8  Feature  with  Autocorrelation  Case,  Across  Sample  Sizes, 


56 


Figure  15  shows  that  the  PNN  is  fairly  robust  to  the  autocorrelation  while  the 
other  three  methods  are  more  affected  by  the  presence  of  autocorrelation.  This  is 
especially  true  as  the  level  of  autocorrelation  reaches  0.9.  That  is,  there  is  little  difference 
between  0.0  level  of  autocorrelation  and  0.5  level  of  autocorrelation  for  any  of  the  three 
methods;  however,  there  is  a  difference  between  0.5  level  of  autocorrelation  and  0.9  level 
of  autocorrelation.  This  degradation  in  performance  is  more  dramatic  as  the  sample  size 
decreases.  At  high  sample  size,  there  is  little  difference  between  0.0  level  of  correlation 
and  0.9  level  of  correlation  for  any  of  the  three  methods.  By  the  time  the  sample  size 
drops  to  500  or  200,  there  is  significant  degradation  in  performance  for  all  methods 
except  the  PNN.  In  conclusion,  the  PNN  and  OBN  outperform  the  other  two  methods 
regardless  of  the  level  of  autocorrelation  or  the  sample  size. 

Problem  3  Results:  8  Feature  with  Autocorrelation  Case,  An  ANOVA  Approach 

O _ 

This  same  set  of  data  can  also  be  examined  using  an  ANOVA  approach. 

Consider  a  three  factor  design  where  the  three  factors  are  Level  of  Autocorrelation,  Level 
of  Across  Correlation,  and  Sample  Size.  Level  of  Autocorrelation  has  three  levels,  0.0, 
0.5,  and  0.9.  Level  of  Across  Correlation  has  six  levels,  0.0,  0.2,  0.4,  0.6,  0.8,  and  0.9. 
Sample  Size  has  five  levels,  50,  100,  250,  500,  and  1000  exemplars  in  each  class.  A  full 
factorial  design  consists  of  all  possible  combinations  of  these  three  factors.  Each  design 
point  was  replicated  five  times;  there  were  a  total  of  450  runs  for  each  of  the  four 
methods.  Table  6  summarizes  the  results  of  each  ANOVA. 


57 


Table  6:  Summary  of  ANOVA  Results. 


Method 

ISOC 

ROC  “Within” 

PNN 

OBN 

0.406 

0.371 

0.372 

0.383 

Adjusted 

0.259 

0.215 

0.216 

0.230 

Mean  Response 

0.572 

0.532 

0.623 

0.617 

Root  MSE 

0.134 

0.151 

0.154 

0.130 

Factors  Considered  and  corresponding  p-values 

Autocorrelation 

<.0001 

<.0001 

0.1639 

<.0001 

Sample  Size 

<.0001 

<.0001 

<.0001 

<.0001 

Auto*Sample 

<.0001 

0.1431 

0.5337 

0.6097 

Correlation 

0.8973 

0.2866 

<.0001 

<.0001 

Auto*Corr 

0.3613 

0.8030 

0.7965 

0.5096 

Corr*Sample 

0.6699 

0.2815 

0.0801 

0.3318 

Auto*Corr*Sample 

0.8428 

0.9243 

0.9997 

0.9995 

These  results  seem  to  eonfirm  statistieally  the  results  that  were  already  shown 
graphieally.  ISOC  and  ROC  “Within”  are  very  robust  to  aeross  eorrelation,  but  they  are 
not  robust  to  autoeorrelation  or  sample  size.  In  other  words,  ehanging  the  autoeorrelation 
level  and  sample  size  level  will  have  an  impaet  on  the  performanee  of  these  two  types  of 
fusion.  The  OBN  is  not  robust  to  any  of  the  three  faetors.  The  PNN  is  robust  to 
autoeorrelation  level,  but  the  PNN  performanee  will  ehange  as  the  level  of 
autoeorrelation  and  sample  size  ehange.  The  good  news  for  the  PNN  and  OBN  is  that 
although  their  performanee  is  less  robust  to  aeross  eorrelation,  they  also  do  as  well  or 
better  than  the  other  two  methods.  This  is  also  apparent  in  the  Mean  Response  for  eaeh 
method.  PNN  has  the  highest  mean  response,  and  OBN  has  the  seeond  highest  mean 
response.  ISOC  has  the  third  highest  mean  response,  and  ROC  “Within”  has  the  lowest 
mean  response.  It  is  also  worth  noting  that  none  of  the  R  or  Adjusted  R  are  partieularly 
good.  This  means  that  only  approximately  20%  of  the  variation  in  the  data  ean  be 
explained  by  the  partieular  models. 


58 


Validation  of  the  assumptions,  normal  errors  and  constant  variance,  via  residual 
analysis  was  done  for  each  of  the  four  methods.  All  four  followed  the  same  pattern  so 
only  those  for  ISOC  are  shown.  Figure  16  shows  a  histogram  of  the  residuals;  Figure  17 
shows  the  residuals  vs  Row  Number. 


Residual  Value 


Figure  16:  ISOC  Histogram  of  Residuals. 


59 


Row  Number 


Figure  17:  ISOC  Residual  TP  Probability  vs  Row  Number, 

Figure  16  shows  that  the  errors  are  approximately  normally  distributed  so  the 
normal  error  assumption  holds.  Figure  17  shows  that  there  is  a  violation  of  the  eonstant 
varianee  assumption.  This  means  that  the  varianee  of  the  residuals  is  not  eonstant  over 
the  entire  proeess.  This  plot  ean  be  divided  into  three  parts:  the  first  150  rows  eorrespond 
to  0.0  level  of  autoeorrelation,  the  seeond  150  rows  eorrespond  to  0.5  level  of 
autoeorrelation,  and  the  final  150  rows  eorrespond  to  0.9  level  of  autoeorrelation.  The 
first  300  rows  seem  to  have  eonstant  varianee;  that  is,  for  all  the  responses  for  0.0  level  of 
autoeorrelation  and  0.5  level  of  autoeorrelation,  the  varianee  seems  to  stay  the  same 
regardless  of  the  level  of  aeross  eorrelation  or  sample  size.  The  varianee  of  the  residuals 
seems  to  explode  right  around  row  300.  This  is  the  start  of  the  0.9  level  of 
autoeorrelation.  At  this  point,  there  is  a  mueh  wider  varianee  in  responses  than  for  the 
first  two  levels  of  autoeorrelation.  Figure  18  is  a  similar  plot,  but  the  rows  are  in  slightly 


60 


different  order.  Now,  the  data  is  first  sorted  by  autoeorrelation  in  aseending  order  and 
then  by  sample  size  in  deseending  order. 


Row  Number 


Figure  18:  ISOC  Residuals  vs  Row  Number,  Resorted 

Figure  18  shows  even  more  diseemable  pattern  of  heteroskedastieity.  Within 
eaeh  group  of  150  rows,  there  is  an  even  more  evident  pattern.  As  the  sample  size 
deereases,  the  variance  within  a  level  of  autocorrelation  increases.  Thus,  one  can  expect 
the  variance  to  increase  not  only  as  the  autocorrelation  level  increases,  but  also  as  the 
sample  size  decreases. 

Since  there  is  so  much  variability  in  the  data,  another  approach  is  to  average  the 
five  replications  for  each  design  point  and  use  the  average  TP  value  as  the  response. 
Table  7  shows  the  results  for  this  analysis. 


61 


Table  7:  Summary  of  ANOVA  Results,  Averaged. 


Method 

ISOC 

ROC 

“Within” 

PNN 

OBN 

0.871 

0.894 

0.926 

0.906 

Adjusted 

0.705 

0.764 

0.835 

0.791 

Mean  Response 

0.557 

0.524 

0.623 

0.604 

Root  MSE 

0.060 

0.064 

0.043 

0.051 

Factors  Considered 

and  corres 

ponding  p-values 

Autocorrelation 

<.0001 

<.0001 

0.0160 

<.0001 

Sample  Size 

<.0001 

<.0001 

<.0001 

<.0001 

Auto*Sample 

0.0005 

0.0040 

0.0459 

0.0053 

Correlation 

0.7204 

0.8582 

<.0001 

<.0001 

Auto*Corr 

0.2231 

0.0444 

0.1519 

0.2656 

Corr*Sample 

0.4223 

0.1227 

0.0002 

0.1448 

The  same  trends  hold  in  this  analysis  in  terms  of  the  most  significant  factors  for  a 
given  method.  Also,  the  mean  responses  hold  the  same  ranking.  That  is,  PNN  has  the 
highest  mean  response,  and  OBN  has  the  second  highest  mean  response.  ROC  “Within” 
has  the  lowest  mean  response,  and  ISOC  has  the  second  lowest  mean  response.  This 
analysis  does  reduce  the  root  mean  square  error  since  there  is  less  inherent  variability  in 
this  process.  Finally,  the  R  and  Adjusted  R  values  for  all  four  methods  are  much  higher 
than  they  were  in  the  previous  analysis.  Again,  since  there  is  much  more  variability  in 
this  data,  the  model  does  a  much  better  job  of  explaining  the  variation  in  the  data. 

Validation  of  the  assumptions,  normal  errors  and  constant  variance,  via  residual 
analysis  was  also  done  for  each  of  the  four  methods  in  this  analysis.  All  four  followed 
the  same  pattern  so  only  those  for  ISOC  are  shown.  Figure  19  shows  a  histogram  of  the 
residuals;  Figure  20  shows  the  residuals  over  time. 


62 


Frequency 


-3  -2  -1  0  1  2  3  4  5 


Residual  Value 

Figure  19:  ISOC  Histogram  of  Average  Residuals 


0  0 


10  20  30  40  50  60  70  80  90  100 
Row  Number 


Figure  20:  ISOC  Residual  TP  Probability  vs  Row  Number, 

These  residual  plots  show  resolution  to  the  violation  of  assumptions  above. 
Figure  19  shows  that  the  residuals  are  approximately  normally  distributed,  and  Figure  20 


63 


shows  that  there  is  approximately  a  constant  variance  over  the  entire  process.  This  means 
that  the  data  in  this  new  analysis  does  not  violate  either  of  the  two  assumptions. 


O 


O 


Problem  4  Results:  8  Feature  Triangle  Case,  Single  Sample  Size 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  Triangle  Problem,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  Figure  21  shows  four  plots,  one  for  each 
of  the  four  fusion  methods.  They  are  average  ROC  curves  over  five  replications  with 
1000  exemplars  in  each  class.  Each  plot  contains  six  ROC  curves,  one  for  each  of  the  six 
levels  of  correlation.  In  addition,  crosshairs  are  place  at  the  point  (0.1,  0.4)  to  add  a  point 


of  reference  common  for  all  four  plots. 


ISOC  Fusion  Correlation  Comparison  N=1000  ROC  "Within"  Fusion  Correlation  Comparison  N=1000 


One  Big  Network  Fusion  Correlation  Comparison  N=1000 


Figure  21:  8  Feature  Triangle  ROC  Curves,  N=1000, 


64 


There  is  a  wide  differenee  between  the  four  methods  in  this  problem.  First  of  all, 
ISOC  is  the  most  robust  to  eorrelation,  and  ROC  “Within”  is  seeond  most  robust  to 
correlation.  Although,  they  are  the  most  robust  to  correlation,  they  are  both  on  the  very 
low  end  of  performance.  On  the  other  hand,  the  PNN  and  OBN  are  not  very  robust  to 
correlation,  but  they  outperform  the  other  two  methods  at  all  levels  of  correlation.  They 
far  outperform  the  other  two  methods  at  high  levels  of  correlation  and  slightly  outperform 
at  low  levels  of  correlation.  In  general,  in  the  simpler  problems,  the  higher  the  level  of 
correlation  results  in  a  lower  the  level  of  performance.  In  this  problem,  there  is  an 
inverse  effect;  this  is  particularly  evident  in  the  PNN  and  OBN  results.  The  higher  level 
of  correlation  results  in  a  higher  level  of  performance. 

At  first  glance,  this  result  seems  highly  counter-intuitive,  but  a  look  at  the 
geometry  of  the  problem  provides  valuable  insight  into  the  results  for  the  PNN.  Figure 
22  shows  a  plot  of  the  posterior  probabilities  from  the  linear  classifier  vs  the  posterior 
probabilities  from  the  quadratic  classifier  for  the  0.0  correlation  level.  In  essence,  this  is 
the  feature  space  of  the  PNN. 


65 


Posterior  Probability  of  Indiydual  Classifiers  rtio=0.0,  N=1000 


0.9 


0.8 


0.7 


■■=  0-6 


O 


0.5 


0.4 


0.3 


o 

D. 


0.2 


0.1 


Class  0 
Class  1 


0  V 


M 

^ 

'i’i  " 

- T  2-#^ - -ir ‘ ^ - 

.  r  './It'  -r  N  ^- 

“  .1^  '“  t  .  />  '  or®  ?  '  o' 


^  + - 


O  X 
-Q  - ■ 


X  tt 


1^ 

'  ^ 


-r, ;; 


I  f  \ 
^  1/ 


o- 


\  izr 

P-  - 

4^  Af  4. 


l-x^JpP 

'  O'  .^-'' 
^  <tN  ~ 


T  ♦ 


r  ^  '  X  '  ^r  >' 

/  ^  ^  I  y  I'l 

+  ^  —  I —  y  —  Oj-  .. 

I  n  ■ 


0.1  0.2  0.3  0.4  0.5  0.6  0.7 

Posterior  Probability  Linear  Classifier 


0.8 


0.9 


Figure  22:  PNN  Feature  Space  Plot,  0,0  Correlation,  N=1000, 

Figure  22  shows  very  little  separation  between  the  two  classes;  thus,  it  is  difficult 
for  the  PNN  to  distinguish  between  the  two  classes.  However,  this  plot  is  much  different 
when  the  correlation  is  0.9.  Figure  23  shows  a  plot  of  the  posterior  probabilities  from  the 
linear  classifier  vs  the  posterior  probabilities  from  the  quadratic  classifier  for  the  0.9 
correlation  level. 


66 


Posterior  Probability  of  Indiydual  Classifiers  rtio=0.9,  N=1000 


Figure  23:  PNN  Feature  Space  Plot,  0,9  Correlation,  N=1000, 

As  is  obvious  from  Figure  23,  adding  the  correlation  significantly  alters  the 
geometry  of  the  problem  posed  to  the  PNN.  Now,  the  PNN  can  very  easily  solve  the 
problem  since  the  classes  are  much  more  distinguishable.  This  explains  why,  in  this 
problem,  the  PNN  has  increased  performance  as  the  correlation  level  increases. 

The  OBN  results  also  seem  highly  counter-intuitive,  but  as  was  the  case  with  the 
PNN,  a  look  at  the  geometry  of  the  OBN  problem  provides  valuable  insight  into  the 
results.  Figure  24  shows  the  feature  space  of  feature  1  and  feature  2  at  0.0  level  of 
correlation.  Figure  25  shows  the  feature  space  of  feature  1  and  feature  4  at  0.0  level  of 
correlation. 


67 


Feature  Space  rho=0.0,  N=1000 


Feature  1 


Figure  24:  Feature  Space  of  Feature  1  and  Feature  2,  0,0  Correlation,  N=1000, 


68 


0  1 

Feature  1 


Figure  25:  Feature  Space  of  Feature  1  and  Feature  2,  0,9  Correlation,  N=1000, 

Neither  of  these  plots  is  unexpected.  Since  Feature  1  and  Feature  2  are  always 
independent,  the  correlation  level  does  not  change  the  geometry  of  the  plot.  Figure  24 
shows  very  little  separation  between  the  two  classes  in  the  feature  space  of  feature  1  and 
feature  2  at  0.0  level  of  correlation,  and  Figure  25  shows  very  little  separation  between 
the  two  classes  in  the  feature  space  of  feature  1  and  feature  2  at  0.9  level  of  correlation. 
Thus,  it  is  difficult  for  the  OBN  to  distinguish  between  the  two  classes  in  this  dimension 
regardless  of  correlation. 

Figure  26  shows  the  feature  space  of  feature  1  and  feature  4  at  0.0  level  of 
correlation.  Figure  27  shows  the  feature  space  of  feature  1  and  feature  4  at  0.9  level  of 
correlation. 


69 


Feature  Space  rho=0.0,  N=1000 


I 

Class  0 
Class  1 

4 


-3 


^ _ I _ I _ I _ I _ I _ I _ 

-3-2-1  0  1  2  3  4 


Feature  1 


Figure  26:  Feature  Space  of  Feature  1  and  Feature  4,  0,0  Correlation,  N=1000, 


70 


Feature  Space  rho=0.9,  N=1000 


Feature  1 


Figure  27:  Feature  Space  of  Feature  1  and  Feature  4,  0,9  Correlation,  N=1000, 

Since  there  is  no  correlation  between  features  1  and  4  in  Figure  26,  there  is  still 
little  separation  between  the  classes.  At  a  high  level  of  correlation,  the  shape  of  the 
feature  space  in  this  dimension  is  signifieantly  changed  such  that  the  OBN  can  easily  tell 
the  difference  between  the  two  classes.  This  change  in  geometry  explains  why  the  OBN 
performance  increases  as  the  level  of  correlation  increases.  As  the  problem  is 
complicated  slightly,  the  ISOC  and  ROC  “Within”  methods  continue  to  diminish  in 
performance  while  the  OBN  and  PNN  methods  continue  to  outperform  at  some  levels  of 
correlation. 


71 


Problem  4  Results:  8  Feature  Triangle  Case,  Varying  Sample  Size 


O 


Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation:  8  feature  Triangle  Problem,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  This  process  was  repeated  for  multiple 
sample  sizes.  Figure  28  shows  four  plots,  one  for  each  of  the  four  fusion  methods.  They 
are  the  average  ROC  curves  over  the  six  levels  of  correlation  for  a  given  sample  size. 
Each  plot  contains  six  ROC  curves,  one  for  each  of  the  six  sample  sizes.  In  addition, 
crosshairs  are  place  at  the  point  (0.1,  0.4)  to  add  a  point  of  reference  common  for  all  four 
plots. 


ISOC  Fusion  Sample  Size  Comparison  ROC  "Within"  Fusion  Sampie  Size  Comparison 


PNN  Fusion  Sampie  Size  Comparison 


One  Big  Network  Fusion  Sampie  Size  Comparison 


Figure  28:  8  Feature  Triangle  ROC  Curves,  Across  Sample  Sizes, 


72 


As  the  problem  beeomes  more  complieated,  more  samples  are  needed.  The  first 
three  sample  sizes  are  fairly  well  separated  in  all  four  methods  while  there  is  little 
difference  between  N=500  and  N=1000  for  any  of  the  four  methods.  This  means  that 
once  the  sample  size  is  500,  there  is  little  to  be  gained  by  increasing  the  sample  size  in 
this  problem.  Again,  as  the  problem  is  slightly  more  complicated,  the  ISOC  and  ROC 
“Within”  methods  continue  to  diminish  in  performance  while  the  OBN  and  PNN  methods 
continue  to  outperform  at  some  levels  of  correlation. 

•  O 

Problem  5  Results:  8  Feature  XOR  Case,  Single  Sample  Size  |Q  ^ 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  XOR  Problem,  and  the  fusion  process  was  followed  as  described  in 
the  Experimental  Design  section.  Figure  29  shows  four  plots,  one  for  each  of  the  four 
fusion  methods.  They  are  average  ROC  curves  over  five  replications  with  1000 
exemplars  in  each  class.  Each  plot  contains  six  ROC  curves,  one  for  each  of  the  six 
levels  of  correlation.  In  addition,  crosshairs  are  place  at  the  point  (0.1,  0.2)  to  add  a  point 
of  reference  common  for  all  four  plots. 


73 


ISOC  Fusion  Correlation  Comparison  N=1000 


ROC  "Within"  Fusion  Correlation  Comparison  N=1000 


PNN  Fusion  Correlation  Comparison  N=1000  One  Big  Network  Fusion  Correlation  Comparison  N=1000 


Figure  29:  8  Feature  XOR  ROC  Curves,  N=1000, 

A  linear  discriminant  function  can  not  adequately  solve  the  XOR  problem  so  the 
linear  classifier  is  not  a  good  classifier.  With  adequate  separation  of  the  classes,  the 
quadratic  discriminant  function  could  solve  the  XOR  problem,  but  in  this  problem 
without  adequate  separation,  the  quadratic  classifier  is  not  a  good  classifier  either.  Thus, 
both  the  ISOC  and  ROC  “Within”  can  not  improve  upon  fusing  two  bad  classifiers.  Now 
that  the  problem  has  become  too  complicated  for  either  the  linear  or  quadratic  classifier 
to  adequately  solve,  the  posterior  probabilities  from  each  of  the  classifiers  do  not  offer 
much  more  information  than  the  binary  predicted  classes.  Thus,  the  PNN  does  not 
improve  upon  either  of  ROC  “Within”  or  ISOC  Fusion  methods.  On  the  other  hand, 
since  the  OBN  eliminates  the  bad  classifiers  altogether,  it  is  able  to  mildly  outperform  the 


74 


other  three  methods  at  low  levels  of  eorrelation,  and  it  is  able  to  greatly  outperform  the 
other  three  methods  at  high  levels  of  eorrelation.  This  improvement  as  a  result  of 
inereasing  eorrelation  is  due  to  similar  geometric  effects  as  were  shown  in  previous 
sections. 

Interestingly,  even  when  the  classes  are  further  separated,  the  PNN  only  does  as 
well  as  the  ISOC  and  ROC  “Within.”  There  is  never  a  huge  increase  in  performance  at 
any  levels  of  correlation  as  there  was  in  the  simpler  problems.  To  show  this,  the  above 
problem  was  rerun  with  further  separation  in  the  classes  for  a  single  sample  size.  Figure 
30  shows  four  plots,  one  for  each  of  the  four  fusion  methods  where  there  is  further 
separation  in  the  classes. 


ISOC  Fusion  Correlation  Comparison  N=1000 


ROC  "Within"  Fusion  Correlation  Comparison  N=1000 


PNN  Fusion  Correlation  Comparison  N=1000 


One  Big  Network  Fusion  Correlation  Comparison  N=1000 


Figure  30:  8  Feature  XOR  ROC  Curves  with  More  Separation,  N=1000, 


75 


Again,  the  PNN  performs  only  as  well  as  the  ISOC  and  ROC  “Within”  fusion 
methods.  This  is  explained  with  further  investigation.  Since  the  linear  classifier  is  a  bad 
classifier  in  this  XOR  problem,  the  posterior  probabilities  from  the  linear  classifier  are 
also  bad.  Thus,  the  PNN  fusion  method  only  uses  the  information  from  the  quadratic 
classifier.  Figure  3 1  shows  the  individual  classifier  average  ROC  curves  over  5 
replications  for  0.0  correlation  and  Figure  32  shows  the  individual  classifier  average 
ROC  curves  over  5  replications  for  0.9  correlation. 


Individual  Classifier  ROCs,  0.0  Correlation,  N=1000 


Figure  31:  Individual  Classifier  ROC  Curves  for  0,0  Correlation, 


76 


Individual  Classifier  ROCs,  0.9  Correlation,  N=1000 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


- 1 - 1 - f - 1. - I - 

I  I  I  I  I 

I  I  I  I  I 

I  I  I  I  I 

I  I  I  I  I 

- 1 - - - 

1  1 

1  1 

1  1 

I  I  I  I  I 

I  I  I  I  I 

I  I  I  I  I 

I  I  I  I  I 

- 1 - ^ - ^ - + - I - 

1  1 

1  1 

1  1 

1  1 

- 1 - 1. - - 

I  /  I  I  I  I 

I  I  I  I  I 

I  I  I  I  I 

I  1  1  I  I 

1  1  1  1  1  ^ 

1  1  1  1  1* 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1  1  1  1 

1  1  1  1  _  '  1 
- 1 - 1 - ^ - ^.-.1 - 1 - 

1  1  1,1  1 

1  1  1  1  1 

1  1 

1  1 

-1  i- 

1  1 

1  1 

1  1 

1  1 

1  1  1  1  1 

1  1  1  1  1 

1  1  1  1  1 

1  1  1  1  1 

- 1 - -J, - ^ - 1. - 1 - 

1  1  1  1  1 

1  1 

1  1 

1  1 

1  1 

- 1 - 1. - ^ 

1  1 

1  1 

1  ''  1  1  1  1 

1  1  1  1  1 

1  1  1  1 

1  1  1  1  1 

1  1  1  1  1 

1  1  1  1  1 

I  I  t  i  i 

1  1 

1  1 

1  1 

1  1 

1  1 

1  1 

i  i 

0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9 

Linear  Classifier 
Quadratic  Classifier 


P(FP) 


Figure  32:  Individual  Classifier  ROC  Curves  for  0,9  Correlation, 

Since  the  quadratie  elassifier  observes  independent  information,  regardless  of  the 
across  correlation,  the  performance  of  this  classifier  does  not  change  as  the  level  of 
correlation  changes.  Also,  the  performance  of  the  PNN  fusion  is  almost  identical  to  the 
performance  of  the  quadratic  classifier.  This  explains  the  difference  in  performance  in 
the  PNN  fusion  between  this  type  of  problem  and  the  simpler  type  of  problem.  The  OBN 
continues  to  outperform  the  other  methods  in  this  type  of  problem  because  of  the  same 
geometric  explanation  in  the  previous  problem. 


O 


O 


Problem  5  Results:  8  Feature  XOR  Case,  Varying  Sample  Size 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  XOR  Problem,  and  the  fusion  process  was  followed  as  described  in 


77 


the  Experimental  Design  section.  This  process  was  repeated  for  multiple  sample  sizes. 
Figure  33  shows  four  plots,  one  for  each  of  the  four  fusion  methods.  They  are  the 
average  ROC  curves  over  the  six  levels  of  correlation  for  a  given  sample  size.  Each  plot 
contains  six  ROC  curves,  one  for  each  of  the  six  sample  sizes.  In  addition,  crosshairs  are 
place  at  the  point  (0.1,  0.2)  to  add  a  point  of  reference  common  for  all  four  plots. 


ISOC  Fusion  Sample  Size  Comparison  ROC  "Within"  Fusion  Sampie  Size  Comparison 


PNN  Fusion  Sampie  Size  Comparison 


One  Big  Network  Fusion  Sampie  Size  Comparison 


Figure  33:  8  Feature  XOR  ROC  Curves,  Across  Sample  Sizes, 

As  the  problem  becomes  even  more  complicated,  more  samples  are  needed.  For 
the  ISOC,  ROC  “Within,”  and  PNN  Fusion  Methods,  there  seems  to  be  increasing 
performance  between  the  first  sample  sizes.  After  a  sample  size  of  500  in  each  class  is 
reached,  the  performance  stops  increasing.  The  OBN  shows  a  similar  pattern.  There  is 
definitely  a  smaller  return  in  performance  from  increasing  the  sample  size  from  500  to 


78 


1000  exemplars  in  each  class,  but  it  may  be  possible  to  increase  performance  a  little  more 
by  further  increasing  the  sample  size. 


Problem  6  Results:  8  Feature  XOR  with  Autocorrelation  Case,  Single  Sample  Size 

•  O 

O# 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation:  8  feature  XOR  with  autocorrelation,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  Figure  34  shows  four  plots,  one  for  each 
of  the  four  fusion  methods.  They  are  the  values  of  true  positive  for  a  false  positive  value 
of  0.1  on  the  average  ROC  curves  over  five  replications  with  1000  exemplars  in  each 
class.  The  true  positive  value  is  plotted  against  the  level  of  across  correlation.  Each  plot 
contains  three  lines,  one  for  each  of  the  three  levels  of  autocorrelation. 


79 


ISOC  TP  for  an  FP=0.1  \s  correlation,  N=1000 


ROC  "Within"  TP  for  an  FP=0.1  correlation,  N=1000 


D. 

H 

D. 


1 

0.8 

0.6 

0.4 

0.21 

0 


-  AutoCorr=0.0 
AutoCorr=0.5 

-  AutoCorr=0.9 

T - ^ - 


0.2 


0.4  0.6 

correlation 


0.8 


PNN  TP  for  an  FP=0.1  vs  correlation,  N=1000 


1 

0.8 

0.6 

0.4 

0.2 

0 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


-r 


-r 


0.2 


0.4  0.6 

correlation 


0.8 


D. 

H 

D. 


0.2 


0.4  0.6 

correlation 


0.8 


OBN  TP  for  an  FP=0.1  \s  correlation,  N=1000 


Figure  34:  8  Feature  XOR  with  Autocorrelation  Case,  N=1000, 

In  this  problem,  as  in  previous  problems,  the  ISOC  Fusion  and  ROC  “Within”  are 
very  robust  to  correlation.  Also,  as  autocorrelation  increases  from  0.0  to  0.5,  there  is  no 
degradation  in  performance  for  the  ISOC  and  ROC  “Within.”  As  the  autocorrelation 
increases  from  0.5  to  0.9,  there  is  a  drop  in  performance  across  all  levels  of  correlation, 
but  it  is  a  small  decrease  for  both  ISOC  and  ROC  “Within.”  The  PNN  seems  to  be  robust 
to  both  types  of  correlation.  That  is,  all  mostly  flat  and  on  top  of  one  another.  This  is 
also  the  first  problem  that  the  PNN  performance  is  as  low  as  the  ISOC  and  ROC 
“Within”  performance.  The  OBN  seems  to  be  affected  by  both  types  of  correlation,  but  it 
always  performs  as  well  or  better  than  the  other  three  methods  at  all  combinations  of  the 
two  types  of  correlation.  As  with  the  first  two  methods,  it  seems  there  is  little  difference 


80 


between  an  autocorrelation  of  0.0  and  0.5,  but  there  is  a  difference  between  an 
autocorrelation  of  0.5  and  0.9.  Also,  there  is  an  interesting  pattern  in  performance  for  the 
OBN  across  levels  of  correlation  for  a  given  level  of  autocorrelation.  The  performance 
seems  to  decrease  slightly  at  a  correlation  level  of  0.2,  but  it  increases  as  the  geometry 
changes  with  the  higher  levels  of  correlation.  The  OBN  continues  to  outperform  the 
other  methods,  especially  as  the  correlation  level  increases,  in  this  type  of  problem 
because  of  the  same  geometric  explanation  in  the  previous  problem.  This  is  the  first 
obvious  example  where  it  is  better  to  eliminate  the  individual  classifiers  and  treat  the 
entire  problem  as  One  Big  Network. 

Problem  6  Results:  8  Feature  with  Autocorrelation  Case,  Across  Sample  Sizes 

•  O 

om 

Features  were  generated  according  to  the  methodology  described  in  the  Data 
Generation;  8  feature  XOR  with  autocorrelation,  and  the  fusion  process  was  followed  as 
described  in  the  Experimental  Design  section.  This  process  was  repeated  for  multiple 
sample  sizes.  Figure  35  shows  four  plots,  one  for  each  of  the  four  fusion  methods.  They 
are  the  true  positive  values  for  a  false  positive  value  of  0.1  on  the  average  ROC  curves 
over  the  six  levels  of  correlation  for  a  given  sample  size.  Sample  size  is  varied  from  50 
exemplars  in  each  class  to  1000  exemplars  in  each  class.  Each  plot  contains  three  curves, 
one  for  each  of  the  three  levels  of  autocorrelation. 


81 


ISOC  TP  for  an  FP=0.1  vs  sample  size 


1  r 

0.8 

0.6  • 

0.4 

0.2 

0  • 

1 

0.8 

0.6 

0.4 

0.2 

0 


— ^  AutoCorr=0.0 
AutoCorr=0.5 
-8-  AutoCorr=0.9 

- 1 - ^ - 


0  200  400  600  800  1000 

sample  size  in  each  class 

PNN  TP  for  an  FP=0.1  vs  sample  size 


— ^  AutoCorr=0.0 

- AutoCorr=0.5 

-e-  AutoCorr=0.9 


0  200  400  600  800  1000 

sample  size  in  each  class 


ROC  "Within"  TP  for  an  FP=0.1  vs  sample  size 

1  - , - . - . - ^ - 


0.8 

0.6  • 

0.4 

0.2 

0  • 

1 

0.8 

0.6 

0.4 

0.2 


'"V— 


-  AutoCorr=0.0 
AutoCorr=0.5 
AutoCorr=0.9 

-I - ^ - 


0  200  400  600  800  1000 

sample  size  in  each  class 

OBN  TP  for  an  FP=0.1  vs  sample  size 


AutoCorr=0.0 

AutoCorr=0.5 

AutoCorr=0.9 


-r 


J— - -O 


0  200  400  600  800  1000 

sample  size  in  each  class 


Figure  35:  8  Feature  XOR  with  Autocorrelation  Case,  Across  Sample  Sizes, 

For  ISOC,  ROC  “Within,”  and  OBN  fusion,  the  performance  decreases  as  the 
sample  size  decreases.  While  this  degradation  in  performance  is  not  dramatic,  it  is  still 
present.  There  are  a  few  exceptions,  but  this  is  mostly  true  across  all  levels  of 
autocorrelation  for  these  three  methods.  The  PNN  shows  a  much  different  result.  For  the 
two  lower  levels  of  autocorrelation,  it  follows  the  same  trend  the  other  three  methods;  as 
the  sample  size  decreases,  the  performance  decreases.  There  is  an  anomaly  for  the  high 
level  of  autocorrelation  that  is  explained  by  further  examination  of  the  posterior 
probabilities.  Just  as  when  the  posterior  probabilities  where  correlated  with 
approximately  the  same  level  of  across  correlation  as  was  inputted,  the  posterior 
probabilities  are  also  approximately  autocorrelated  as  the  level  of  autocorrelation 


82 


Posterior  Probability  Quadratic  Classifier 


inputted.  Figure  36  shows  the  feature  spaee  of  the  PNN  fusion  from  a  single  run  at  0.0 
level  of  autoeorrelation,  and  Figure  37  shows  the  feature  space  of  the  PNN  fusion  from  a 
single  run  at  0.9  level  of  autocorrelation. 


PNN  Fusion  Feature  Space,  0.0  autocorrelation 


0.9  r 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


OL- 

0.2 


^ 

0.3 


^ 

0.4 


1 

0.5 


^ 

0.6 


^ 

0.7 


t 

0.8 


Class  0 
Class  1 


^ 

0.9 


Posterior  Probability  Linear  Classifier 


Figure  36:  PNN  Fusion  Feature  Space,  0,0  Autocorrelation, 


83 


PNN  Fusion  Feature  Space,  0.9  autocorrelation 


O 


o 

D. 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


Class  0 
Class  1 


Kaj — uLi 


0.1  0.2  0.3  0.4  0.5  0.6  0.7 

Posterior  Probability  Linear  Classifier 


0.8 


0.9 


Figure  37:  PNN  Fusion  Feature  Space,  0,9  Autocorrelation, 

Since  the  posterior  probabilities  are  not  autoeorrelated  in  Figure  36,  there  is  little 
separation  between  the  classes;  however,  since  the  posterior  probabilities  are  highly 
autoeorrelated  in  Figure  37,  there  is  a  great  deal  of  separation  between  the  elasses.  In 
Figure  36  the  autocorrelation  of  the  posterior  probabilities  from  the  quadratic  classifier, 
for  instanee,  are  -0.09,  essentially  0.  In  Figure  37  the  autoeorrelation  of  the  posterior 
probabilities  from  the  quadratie  classifier,  for  instance,  are  0.83.  This  is  especially 
evident  at  the  low  sample  sizes  beeause  with  only  limited  sample  sizes  at  the  high 
autoeorrelation  level,  the  features  do  not  have  enough  observations  to  recover  from  the 
high  autoeorrelation  levels.  With  the  high  samples,  there  will  eventually  be  a  great  deal 


84 


of  overlap  between  the  two  elasses  in  the  feature  spaee.  This  explains  the  anomaly  with 
the  PNN  fusion  in  this  problem. 

Problem  6  Results:  8  Feature  XOR  with  Autocorrelation  Case,  An  ANOVA 


•  O 


Approach  M  ^ 

This  same  set  of  data  ean  also  be  examined  using  an  ANOVA  approaeh. 

Consider  a  three  faetor  design  where  the  three  faetors  are  Level  of  Autoeorrelation,  Level 
of  Aeross  Correlation,  and  Sample  Size.  Level  of  Autoeorrelation  has  three  levels,  0.0, 
0.5,  and  0.9.  Level  of  Aeross  Correlation  has  six  levels,  0.0,  0.2,  0.4,  0.6,  0.8,  and  0.9. 
Sample  Size  has  five  levels,  50,  100,  250,  500,  and  1000  exemplars  in  eaeh  elass.  A  full 
faetorial  design  eonsists  of  all  possible  eombinations  of  these  three  faetors.  Eaeh  design 
point  was  replieated  five  times,  and  the  response  variable  is  the  average  response  over  the 
five  replieations.  Table  8  summarizes  the  results  of  eaeh  ANOVA. 

Table  8:  Summary  of  XOR  ANOVA  Results,  Averaged, 


Method 

ISOC 

ROC  “Within” 

PNN 

OBN 

0.874706 

0.768371 

0.914785 

0.944138 

Adjusted  R^ 

0.713143 

0.484626 

0.810397 

0.875708 

Mean  Response 

0.149846 

0.162378 

0.223818 

0.261231 

Root  MSE 

0.027504 

0.030867 

0.042539 

0.04626 

Factors  Considered 

1  and  corresponding  p-values 

Autocorrelation 

<.0001 

0.0009 

<.0001 

<.0001 

Sample  Size 

<.0001 

<.0001 

0.0014 

<.0001 

Auto*Sample 

0.0101 

0.1110 

<.0001 

0.0004 

Correlation 

0.0183 

0.0493 

0.8690 

<.0001 

Auto*Corr 

0.4385 

0.8881 

0.3466 

0.0003 

Corr*Sample 

0.5225 

0.7314 

0.8049 

<.0001 

Again,  in  terms  of  signifieant  variables,  the  results  of  the  ANOVA  eonfirm  the 
results  that  were  shown  graphieally.  For  the  ISOC  Fusion,  autoeorrelation,  sample  size. 


85 


and  their  two  way  interaction  have  the  highest  significance.  That  is,  a  change  in  these 
variables  is  most  likely  to  trigger  a  change  in  the  response.  Although  autocorrelation  did 
not  appear  significant  in  the  high  sample  size  case  shown  graphically,  it  has  a  low  p- 
value.  Across  correlation  is  the  least  significant  of  the  main  effects.  For  ROC  “Within” 
Fusion,  autocorrelation  and  sample  size  are  the  most  significant  which  is  what  was  shown 
graphically  above.  For  PNN  Fusion,  autocorrelation,  sample  size,  and  their  two  way 
interaction  have  the  highest  significance.  Again,  this  confirms  the  graphical  results.  For 
the  OBN,  all  variables  are  significant.  The  OBN  seems  to  be  the  most  sensitive  to 
changes  in  all  three  main  effects.  Overall,  the  OBN  has  the  highest  mean  response,  and 
the  PNN  has  the  second  highest  mean  response.  ISOC  has  the  lowest  mean  response,  and 
ROC  “Within”  has  the  second  lowest  mean  response.  The  OBN  has  the  highest  R^ 
values,  and  the  PNN  has  the  second  highest  R^  values.  The  ROC  “Within”  has  the  lowest 
R  values,  and  the  ISOC  has  the  second  lowest  R  values.  This  different  approach  to 
looking  at  the  same  data  a  different  way  seems  to  confirm  what  was  already  show 
graphically. 

Validation  of  the  assumptions,  normal  errors  and  constant  variance,  via  residual 
analysis  was  done  for  each  of  the  four  methods.  All  four  followed  the  same  pattern  so 
only  those  for  ISOC  are  shown.  Figure  38  shows  a  histogram  of  the  residuals;  Figure  39 
shows  the  residuals  vs  Row  Number. 


86 


Residual  Value 


Figure  38:  ISOC  Histogram  of  Residuals. 


10  20  30  40  50  60  70  80  90  100 
Row  Number 


Figure  39:  ISOC  Residual  TP  Probability  vs  Row  Number, 

Figure  38  shows  that  the  residuals  are  approximately  normally  distributed.  Figure 
39  shows  that  the  residuals  have  approximately  constant  variance.  These  figures  show 
that  the  two  assumptions  of  the  model,  normal  errors  and  constant  variance,  hold  for  this 
analysis. 


87 


Problem  7  Results:  20  Feature  without  Autocorrelation  Case  using  Feature 


Selection 


Features  were  generated  aceording  to  the  methodology  described  in  the  Data 
Generation;  20  feature  without  autocorrelation,  and  the  fusion  process  and  feature 
selection  process  was  followed  as  described  in  the  Experimental  Design  section.  This 
process  was  repeated  for  two  sample  sizes  over  15  replications.  For  each  method  and 
sample  size,  the  average  ROC  curve  was  calculated  with  and  without  feature  selection 
(twelve  total  ROC  curves).  Each  plot  contains  six  ROC  curves,  one  for  each  level  of 
correlation.  Figure  40  shows  the  six  ROC  curves  for  sample  size  50  in  each  class,  and 
Figure  41  shows  the  six  ROC  curves  for  sample  size  of  1000  in  each  class. 


ISOC  W/  Feature  Selection:  N=50 


ROC  "Within"  W/  Feature  Selection:  N=50 


ISOC  W/O  Feature  Selection:  N=50 


ROC  "Within"  W/O  Feature  Selection:  N=50 


Figure  40:  20  Feature  without  Autocorrelation  Case,  N=50, 


88 


From  these  plots,  it  is  obvious  that  in  this  partieuiar  problem,  redueing  the 
dimensionality  of  the  feature  set  does  not  deerease  performanee  in  terms  of  ROC  eurves. 
There  is  not  a  big  differenee  between  the  ROC  eurves  with  feature  seleetion  and  without 
feature  seleetion.  This  means  that  the  feature  set  ean  be  signifieantly  redueed  via  feature 
seleetion  without  deereasing  fusion  performanee. 

For  the  low  sample  size  problem,  the  feature  seleetion  proeess  was  not  eonsistent. 
Sometimes  the  good  features  had  high  loadings,  and  sometimes  they  had  low  loadings. 
Sometimes  the  redundant  features  had  high  loadings,  and  sometimes  they  had  low 
loadings.  Sometimes  even  the  noise  features  had  high  loadings,  and  sometimes  they  had 
low  loadings.  Regardless,  similar  performanee  was  obtained  using  a  significantly 
reduced  feature  set  resulting  from  the  feature  selection. 


89 


ISOC  W/  Feature  Selection:  N=1000 


ISOC  W/O  Feature  Selection:  N=1000 


ROC  "Within"  W/  Feature  Selection:  N=1000 


ROC  "Within"  W/O  Feature  Selection:  N=1000 


Figure  41:  20  Feature  without  Autocorrelation  Case,  N=1000, 

From  these  plots,  as  in  the  low  sample  size  problem,  it  is  obvious  that  in  this  high 
sample  size  problem,  reducing  the  dimensionality  of  the  feature  set  does  not  decrease 
performance  in  terms  of  ROC  curves.  There  is  not  a  big  difference  between  the  ROC 
curves  with  feature  selection  and  without  feature  selection.  This  means  that  the  feature 
set  can  be  significantly  reduced  via  feature  selection  without  decreasing  fusion 
performance. 

For  the  high  sample  size  problem,  the  feature  selection  process  was  very 
consistent.  In  all  fifteen  runs,  both  the  good  features  and  all  four  redundant  features  had 
loadings  greater  than  0.45.  This  means  that  the  feature  selection  process  was  able  to 


90 


detect  and  delete  all  the  noise  features  but  not  the  redundant  features.  In  addition,  similar 
results  were  obtained  using  a  much  smaller  feature  set  resulting  from  feature  selection. 

Since  it  is  hard  to  visually  compare  the  ROC  curves,  Figure  42  shows  the  value  of 
true  positive  on  the  ROC  curve  for  a  false  positive  value  of  0.1  for  all  six  values  of 
correlation  for  three  methods  for  the  low  sample  size  problem.  Figure  43  shows  the  value 
of  true  positive  on  the  ROC  curve  for  a  false  positive  value  of  0.1  for  all  six  values  of 
correlation  for  three  methods  for  the  high  sample  size  problem. 


ISOC  TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  =  50 

0.8 1 - ^ ^ - 

I  I 

I  I 

0.6 - 1 - 1 - 


CL 

H 

CL 


0.4, . 


□Z 

with  Feature  Selection 
Without  Feature  Selection 


0.2  L 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

ROC  "Within"  TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  = 

0.8  r 


50 


0.6 

0.4 

0.2 

0 

PNN 
0.8 


I  I  ;  I  I  I 

1  — 1- —  1 

— ^  With  Feature  Selection 

1  1  1  1  1  1 

Without  Feature  Selection 

-  1  1  -t  T  r  1 

1  1  1  1  1  1 

I  I  1  1  1  1 

_ j _ j _ j _ f- 

1  1  1 

1  1  1 

I  I  1  1  1  1 

1  -  i_  1  1  -  t 

1  1  1 

- 1 - « - p - 

1  1  J 

— t — - ^  H - -  i  -  1  1 

_ [ _ I _ 1 _ 1 _ 1 _ 1 

1  1  t 

1 _ I _ ^ _ 

0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  =  50 


0.6 

0.4 

0.2 


I  1  i  I  1 

1  1  1  1  1 

1  1  1  1  1 

1  1  1  1  1 

— ^  With  Feature  Selection 
Without  Feature  Selection 

_ 

1  1  I  1  1 

1  1  I  1  1 

1  1  T  1  1 

- 1 - 1 - -f- - 1 - 1 - 

ill'  1 

1  1  1 

1  1  1 

1  1  1 

- 1 - 1 - -p - - 

1  1  .  t 

_ ^ _ I _ ^ _ 

I  —  - — t — - i -  1  -  * 

1  I  1  1  1 

_ ^ _ I _ I _ I _ ^ _ 

0.1 


0.2 


0.3 


0.4 


0.5 

correlation 


0.6 


0.7 


0.8 


0.9 


Figure  42:  True  Positive  Values  vs.  Correlation  Level  for  0,1  False  Positive  Rate, 

In  Figure  42,  it  is  apparent  that  reducing  the  feature  set  via  feature  selection 
results  in  little  or  no  degradation  in  performance  at  the  low  sample  size;  however, 
regardless  of  feature  selection,  performance  remains  nearly  the  same  across  all  levels  of 


91 


correlation.  In  addition,  this  shows  that  these  three  fusion  methods  are  fairly  robust  to 
correlation;  that  is,  the  value  of  true  positive  stays  nearly  constant  across  all  levels  of 
correlation.  Also,  the  PNN  performs  comparably  to  the  other  two  methods  despite  being 
given  only  half  the  data  that  the  other  two  methods  are  given. 


D. 

H 

D. 


0.4 


ZC 


ISOC  TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  =  1000 

0-8 1 - 1 - ^ - 1 - 1 - 1 - 1 - 

I  I  I  I  I 

0.6, r - 1-  ^ - ^ - i - 1 - ^ - 1 - 


With  Feature  Selection 
Without  Feature  Selection 

^ - « 


0.2  L 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

ROC  "Within"  TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  =  1000 

0.8, - 


-4 - 


With  Feature  Selection 
Without  Feature  Selection 


_j_ 


_j_ 


_j_ 


0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

PNN  TP  Values  for  an  FP  Value  of  0.1  by  correlation  with  and  without  feature  selection  for  sample  size  =  1000 

0.8  r 


0.6 


D. 

H 

CL 


0.4 


0.2 


;  I  i  I  I  i 

I  1  1  1  1  1 

— With  Feature  Selection 
—  Without  Feature  Selection 

i  1  1  1  1  4 

I  1  1  1  1  1 

1  I  1  1  1  1 

1  I  1  1  1  1 

- 1 - ! - i - + - 1 - 1 - 

1  1  1  1  1  1 

1  1  1  1  1  1 

f  t  1  1  1  1 

_ ^ _ 1 _ I _ I _ I _ ^ _ 

1  1  1 

1  1  1 

1  1  1 

- 1 - 1 - p - - 

1  1  1 

1  1  1 

1  1  1 

_ ^ _ I _ ^ _ 

0.1 


0.2 


0.3 


0.4 


0.5 

correlation 


0.6 


0.7 


0.8 


0.9 


Figure  43:  True  Positive  Values  vs.  Correlation  Level  for  0,1  False  Positive  Rate, 

Figure  43  shows  that,  in  the  high  sample  size  problem,  performing  feature 
selection  always  does  as  good  or  better  than  no  feature  selection;  although,  the 
improvement  is  minimal  across  all  levels  of  correlation.  In  addition,  this  shows  that  these 
three  methods  are  fairly  robust  to  the  level  of  correlation;  that  is,  the  value  of  true 
positive  stays  nearly  constant  across  all  levels  of  correlation.  Also,  the  PNN  outperforms 


92 


both  the  ISOC  and  ROC  “Within”  Fusion  methods  despite  being  given  only  half  the  data 
of  those  methods. 

Problem  8  Results:  36  Feature  without  Autocorrelation  Case,  Single  Correlation, 

Single  Sample  Size,  using  Feature  Selection 

For  this  problem,  the  data  was  generated  and  the  loadings  were  ealculated  as 
deseribed  above.  First,  the  classification  accuracy  was  calculated  for  each  classifier 
fusion  method  using  all  18  features  for  each  classifier.  Next,  the  features  with  the  lowest 
six  loadings  were  excluded  and  the  classification  accuracy  was  recalculated.  Then,  the 
classification  accuracy  was  calculated  for  one  feature  up  to  twelve  features  for  each 
individual  classifier  (e.g.,  when  the  number  of  features  was  five,  the  three  methods  used 
the  five  features  with  the  highest  loadings).  These  values  were  plotted  for  each  fusion 
method  in  Figure  44. 


93 


ISOC  Classification  Accuracy  Number  of  Features  for  1  run  with  sampie  size  =  50 


0.5 


0.5 


0.5 


CA  iSOC 


_ ^ _ I _ I _ I _ I _ I _ I _ I _ 

2  4  6  8  10  12  14  16 

ROC  "Within"  Ciassification  Accuracy  \«  Number  of  Features  for  1  run  with  sampie  size  =  50 

T - 1 - 1 - 1 - 1 - ^ -  I  1 

—  CA  ROC  "Within" 


18 


^ I_ 1_ I_ I_ I_ I_ I 

2  4  6  8  10  12  14  16 

PNN  Classification  Accuracy  vs  Number  of  Features  for  1  run  with  sampie  size  =  50 

T - 1 - 1 - r 

CA  PNN 


18 


6  8  10  12 
Number  of  Features  for  Each  Classifier 


14 


16 


18 


Figure  44:  Classification  Accuracy  vs.  Number  of  Features  for  3  Fusion  Methods, 

In  all  three  plots,  the  graph  is  relatively  flat,  indicating  that  in  this  run,  the 
classiflcation  accuracy  is  relatively  insensitive  to  the  number  of  features  that  the 
individual  classifiers  observe.  This  means  that,  potentially,  very  few  features  could  be 
used  for  the  individual  classifiers  underlying  the  fusion,  via  feature  selection,  while 
maintaining  the  same  level  of  performance  as  having  many  features.  There  seems  to  be 
some  rising  and  falling  of  classiflcation  accuracy,  but  all  increases  and  decreases  are 
extremely  mild. 


O 


Problem  9  Results:  TPM  Exploration 

An  observation  was  made  in  all  or  part  of  the  above  analysis.  There  seems  to  be  a 
declining  trend  in  terms  of  fusion  performance  (i.e.,  ROC  curves)  as  the  correlation 


94 


between  feature  sets  increases,  but  there  also  seems  to  be  a  point,  usually  at  a  higher  level 
of  correlation,  where  fusion  performance  actually  benefits  from  the  level  of  correlation 
between  feature  sets.  To  more  fully  understand  this  phenomenon,  the  TPM  was 
calculated  for  different  values  of  correlation  between  features  for  two  of  the  problems 
already  explored  above.  Figure  45  shows  the  TPM  values  vs  correlation  values  for  the  4 
feature  without  autocorrelation  case. 

0.22 

0.2 

0.18 

g  0.16 

H 

0.14 

0.12 

0.1 

0  0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1 

correlation 

Figure  45:  TPM  Values  vs  Correlation,  4  Feature  No  Autocorrelation  Case, 

In  Figure  45,  it  is  apparent  that  the  TPM  rises  as  the  correlation  increases  from  0.0 
to  0.84,  but  there  is  a  sharp  decline  after  this  point.  The  TPM  at  0.99  correlation  actually 
drops  below  the  TPM  at  0.00  correlation. 


95 


This  led  to  additional  experiments  where  the  correlation  between  feature  sets  had 
only  three  values:  one  at  0.00  correlation,  one  at  the  highest  point  of  the  TPM  plot,  and 
one  at  0.99  correlation.  This  process  was  replicated  5  times,  and  the  average  ROC  curves 
were  calculated.  This  was  done  at  two  sample  sizes:  1000  exemplars  in  each  class  and 
50  exemplars  in  each  class.  Figure  46  shows  the  average  ROC  curve  for  each  fusion 
method  with  1000  exemplars  in  each  class  for  the  4  feature  problem.  Figure  47  shows 
the  average  ROC  curve  for  each  fusion  method  with  50  exemplars  in  each  class  for  the  4 
feature  problem. 


ISOC  Fusion  ROC  "Within"  Fusion  PNN  Fusion 


Figure  46:  4  Feature  Problem,  N=1000, 

The  ISOC  plot  from  Figure  46  shows  again  that  the  ISOC  Fusion  is  the  most 


robust  to  correlation.  It  is  very  hard  to  tell  the  difference  between  any  of  the  three 


96 


correlation  levels.  Thus,  the  ISOC  plot  does  not  show  the  results  expected  from  the  TPM 
calculations.  The  ROC  “Within”  plot  shows  that  the  0.00  correlation  curve  is  better  on 
average  than  the  two  higher  correlation  levels,  but  the  two  curves  for  0.84  and  0.99 
correlation  levels  are  almost  identical.  Thus,  the  ROC  “Within”  plot  does  not  show  the 
results  expected  from  the  TPM  calculations.  The  PNN  plot  finally  shows  some 
separation  between  the  three  correlation  levels.  The  0.99  correlation  was  expected  to 
outperform  both  the  0.00  correlation  and  the  0.84  correlation.  While  the  0.99  correlation 
does  not  outperform  the  0.00  correlation,  it  does  outperform  the  0.84  correlation. 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


ISOC  Fusion 

^  . 

I 

i 

/'*  I 

_ 

/* 
c* 

r  f - ' - 


ROC  "Within"  Fusion 


PNN  Fusion 


corr=0.00 

corr=0.84 

corr=0.99 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


corr=0.00 

corr=0.84 

corr=0.99 


0.5 

P(FP) 


0.5 

P(FP) 


P{FP) 


Figure  47:  4  Feature  Problem,  N=50, 


The  ISOC  plot  from  Figure  47  shows  a  little  different  result  than  Figure  46.  The 


0.99  correlation  actually  outperforms  the  other  two  correlations  at  some  points  on  the 


97 


ROC  curves.  This  is  what  was  expected  from  the  TPM  calculations.  The  ROC  “Within” 
plot  shows  nearly  the  same  results  as  Figure  46.  While  there  is  more  separation  between 
the  higher  correlation  levels,  the  0.00  correlation  still  outperforms  the  0.99  correlation. 
The  PNN  plot  in  Figure  47  also  shows  nearly  the  same  result  as  Figure  46.  The  0.99 
correlation  outperforms  the  0.84  correlation,  but  it  does  not  outperform  the  0.00 
correlation. 

Chapter  Summary 

This  chapter  provided  the  details  of  the  findings  and  analysis  of  this  thesis. 
Results  from  each  of  the  problems  posed  in  Chapter  3  were  presented  in  this  chapter. 
Insights  resulting  from  each  analysis  were  also  given. 


98 


V,  Conclusion 


Introduction 

This  chapter  concludes  the  thesis  research.  First,  the  major  literature  review 
findings  are  presented,  and  the  general  methodology  employed  is  reviewed.  The  major 
results  of  this  research  are  summarized,  and  recommendations  for  future  research  are 
discussed. 

Literature  Review  Findings 

In  the  current  day,  the  United  States  Air  Force  is  focused  on  accurate  and  timely 
targeting.  Air  Force  Doctrine  states  that  targets  should  not  be  struck  with  only  single 
source  intelligence  (AFP AM  14-210,  1998).  Instead,  intelligence  information  from  more 
than  one  source  should  be  fused  together  in  order  to  ensure  a  higher  degree  of  accuracy 
(AFP AM  14-210,  1998).  This  higher  degree  of  accuracy  ensures  less  fratricide  in  combat 
operations. 

There  are  many  different  ways  to  fuse  data,  and  data  can  be  fused  at  a  variety  of 
different  levels.  Many  of  these  methods  of  fusing  data  assume  that  the  inputs  to  the 
fusion  are  independent,  and  little  is  known  about  what  happens  when  the  inputs  to  the 
fusion  are  not  independent  (Willett,  et  al,  2000).  In  this  research,  four  different  ways  of 
fusing  information  are  exercised.  The  first  two  models,  ISOC  fusion  (Haspert,  2000)  and 
ROC  “Within”  fusion  (Oxley  and  Bauer,  2002)  are  classifier  fusion  techniques  that 
assume  that  the  individual  classifiers  are  independent.  The  third  fusion  method,  PNN 
fusion,  is  a  classifier  fusion  technique  that  makes  no  assumption  about  the  independence 
of  the  classifiers.  The  fourth  fusion  method,  OBN  fusion,  is  not  a  classifier  fusion 


99 


technique.  It  simply  treats  all  the  individual  features  as  inputs  to  one  big  network.  It 
makes  no  assumption  about  the  independence  of  the  features. 

Methodology  Employed 

Data  was  generated  for  a  variety  of  problems  for  this  thesis.  Correlation  was 
introduced  in  two  forms  to  the  process:  across  correlation  and  autocorrelation.  The  level 
of  these  correlations  were  varied  to  observe  how  each  method  reacted  to  the  correlation 
and  to  observe  how  these  methods  compared  to  each  other  at  the  same  levels  of 
correlation.  In  addition,  sample  size  was  varied  throughout.  In  some  cases,  a  feature 
selection  process  was  performed  to  compare  how  the  fusion  performed  in  the  presence 
and  absence  of  feature  selection  both  within  and  across  the  fusion  methods.  Finally, 
some  explorations  in  TPM  were  performed. 

Results 

This  thesis  yielded  many  interesting  results  and  a  great  deal  of  insight  into  the 
fusion  process  was  obtained.  Problem  1  and  problem  2  possess  a  very  similar  structure, 
and  they  both  provide  similar  insight.  They  both  show  that,  in  this  type  of  problem,  the 
PNN  and  OBN  are  superior  to  the  ISOC  and  ROC  “Within”  fusion  methods  at  all  levels 
of  across  correlation.  While  the  ISOC  and  ROC  “Within”  fusion  methods  are  very  robust 
to  the  across  correlation,  they  do  not  perform  as  well  as  the  other  two  methods.  In  fact, 
the  PNN  outperforms  the  other  three  methods  while  only  observing  half  the  data  as  the 
other  three  methods.  This  trend  is  true  regardless  of  sample  size. 

Problems  3  introduces  autocorrelation  into  the  fusion  process.  Despite  this 
addition  of  autocorrelation,  all  four  methods  observe  the  same  trends  in  terms  of  across 
correlation  and  sample  size  as  those  observed  in  problem  1  and  problem  2.  In  problem  3, 


100 


ISOC,  ROC  “Within,”  and  OBN  are  susceptible  to  autocorrelation,  especially  at  low 
sample  sizes.  The  PNN  seems  to  be  robust  to  autocorrelation  in  this  type  of  problem. 

Problem  4  is  the  first  problem  where  something  unexpected  occurred.  For  all  four 
methods,  increasing  the  level  of  correlation  actually  improves  the  performance  of  the 
fusion.  This  is  easily  explained  with  a  geometric  interpretation  of  the  problem.  Again, 
the  ISOC  and  ROC  are  very  robust  to  the  across  correlation,  but  they  are  always 
outperformed  by  the  PNN  and  OBN.  This  is  true  across  all  sample  sizes. 

Up  until  this  point,  the  PNN  and  OBN  had  been  performing  very  similarly;  they 
always  outperformed  the  other  two  methods.  In  problem  5,  the  PNN  only  performs  as 
well  as  the  ISOC  and  ROC  “Within”  fusion  methods;  on  the  other  hand,  the  OBN 
outperforms  the  other  three  methods  at  all  levels  of  correlation.  Problem  6  showed  that 
the  OBN  continued  to  outperform  the  other  three  methods  in  the  presence  of 
autocorrelation.  This  is  true  in  almost  every  case;  the  PNN  outperforms  the  OBN  at  low 
sample  size  cases  with  high  autocorrelation.  This  is  another  case  that  is  counter-intuitive, 
but  it  is  easily  explained  with  a  geometric  interpretation. 

Problem  7  and  problem  8  are  two  problems  in  which  the  number  of  features  is 
increased  so  that  feature  selection  can  be  explored.  In  both  problems,  it  was  shown  that 
feature  selection  can  be  very  beneficial.  Feature  selection  was  used  to  reduce  the 
dimensionality  of  the  problems  without  degradation  in  fusion  performance. 

Problem  9  is  a  further  investigation  into  problem  1  at  only  specific  levels  of 
across  correlation.  There  are  cases  where  the  TPM  will  actually  decrease  at  very  high 
levels  of  across  correlation.  The  results  from  problem  9  show  that  the  fusion  for  a  highly 


101 


correlation  feature  set  ean  aetually  be  better  than  the  fusion  for  a  more  moderately 
eorrelated  feature  set. 

Overall,  some  key  insights  were  gained  from  this  researeh.  First,  the  aeross 
eorrelation  injeeted  into  the  fusion  proeess  is  approximately  equal  to  the  eorrelation  of 
the  posterior  probabilities  of  the  individual  elassifiers.  Seeond,  the  autoeorrelation 
injeeted  into  the  fusion  proeess  is  approximately  equal  to  the  autoeorrelation  of  the 
posterior  probabilities  of  the  individual  elassifier.  Next,  while  a  high  level  of  eorrelation 
is  usually  assoeiated  with  having  less  information,  sometimes  this  high  level  of 
eorrelation  aetually  aids  the  fusion  as  it  ehanges  the  geometry  of  the  problem.  Next, 
ISOC  and  ROC  “Within”  seem  to  be  the  most  robust  to  aeross  eorrelation  even  though 
they  are  the  methods  that  assume  independenee  of  the  elassifiers.  Although,  they  are 
robust  methods,  they  are  always  outperformed  by  one  of  the  other  two  methods  whieh  do 
not  make  the  independenee  assumption.  High  levels  of  autoeorrelation  seemed  to 
deerease  performanee  of  eaeh  of  the  fusion  methods  for  all  sample  sizes  exeept  the  PNN 
at  low  sample  sizes.  Also,  generally  a  lower  sample  size  results  in  lower  performanee, 
and  usually,  there  is  a  point  where  adding  more  samples  will  not  neeessarily  inerease 
performanee.  Finally,  OBN  seems  to  be  the  most  suoeessful  fusion  method  as  it 
performed  as  well  or  better  than  the  other  three  methods  for  eaeh  of  the  problems  in  this 
thesis. 

Recommendations  for  Future  Research 

While  this  thesis  provided  a  great  deal  of  insight  into  the  fusion  proeess,  there  is 
still  mueh  more  researeh  that  ean  be  done  in  the  field.  First,  the  biggest  shorteoming  of 
this  researeh  is  that  all  the  data  was  fabrieated.  As  real-world  data  sets  beeome  available. 


102 


this  fusion  process  should  be  applied  to  those  data  sets.  In  the  absenee  of  real  world  data, 
the  feature  sets  used  in  this  thesis  eould  be  expanded  so  that  more  noise  and  redundant 
features  are  added.  Next,  different  classifiers,  such  as  neural  networks,  could  be  used  as 
the  individual  elassifiers  instead  of  using  the  linear  and  quadratie  discriminant  functions. 
Also,  in  concurrent  thesis  researeh,  fusion  has  been  done  with  three  classifiers,  but  this 
could  be  extended  even  further  to  a  larger  number  of  elassifiers.  Once  the  number  of 
classifiers  has  been  extended,  classifier  selection,  similar  to  feature  seleetion,  can  be 
performed  to  seleet  only  good  elassifiers  to  be  fused.  Finally,  all  of  this  research  focuses 
on  the  two-class  problem;  research  could  be  extended  to  a  three-class  or  higher  problem. 


103 


Bibliography 


Air  Force  Doctrine  Document  2-\,Air  Warfare,  22  January  2000. 

Air  Force  Pamphlet  14-210,  USAF Intelligence  Targeting  Guide,  1  February  1998. 

Haspert,  J.K.,  “Optimum  ID  Sensor  Fusion  for  Multiple  Target  Types.”  IDA  Document 
D245I,  2000. 

Clutz,  T.,  A  Framework  for  Prognostics  Reasoning.  Air  Force  Institute  of  Technology 
(AU),  Wright-Patterson  AFB  OH,  December  2002. 

Fukunaga,  Keinosuke,  and  Raymond  R.  Hayes,  “Effects  of  Sample  Size  in  Classifier 
Design.”  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intelligence,  Volume  11, 
No.  8,  August  1989,  pp.  873-885 

Laine,  Trevor,  “Relevant  Background  for  the  Research  of  Sensor  Fusion  as  Applied  to 
Automatic  Target  Detection/Recognition  (ATD/R)  in  an  Environment  with  Correlated 
Input  Data.”  Advanced  Applications  for  ANNs  Special  Study,  2003. 

Oxley,  M.E.,  and  K.  Bauer,  “Classifier  Eusion  for  Improved  System  Performance.” 
AFIT/ENS  Working  Document  02-02,  2002. 

Ralston,  J.M.,  “Bayesian  Sensor  Eusion  for  Minimum-Cost  I.D.  Declaration.”  1998  Joint 
Service  Combat  Identification  Systems  Conference  on  Requirements,  Technologies  and 
Developments,  (CISC-98),  Volume  1  -  Technical  Proceedings. 

Shipp,  C.A  and  E.I.  Kuncheva,  “Relationships  between  combination  methods  and 
measures  of  diversity  in  combining  classifiers.”  Information  Fusion,  vol  3,  iss  2:  2  June 
2002, pp. 135-138. 

Storm,  S.  A.,  K.  Bauer,  and  M.  Oxley,  “An  Investigation  in  the  Effects  of  Correlation  in 
Classifier  Eusion.”  Intelligent  Engineering  Systems  Through  Artificial  Neural  Networks, 
vol  13:  2003,pp.619-624. 

Wasserman,  P.,  and  V.  Nostrand,  Advanced  Methods  in  Neural  Computing.  1993. 

Willet,  P.,  P.E.  Swaszek,  and  R.S.  Bind,  “The  good,  bad  and  ugly:  distributed  detection 
of  a  known  signal  in  dependent  Gaussian  noise.”  IEEE  Transactions  on  Signal 
Processing,  vol  48,  iss  12:  12  December  2000,  pp.  3266-3279. 


104 


Vita 


Captain  Nathan  J.  Leap  graduated  from  Bishop  MeCort  High  Sehool  in 
Johnstown,  Pennsylvania.  He  entered  undergraduate  studies  at  the  United  States  Air 
Foree  Aeademy  (USAFA)  where  he  graduated  with  a  Baehelor  of  Seienee  Degree  in 
Operations  Researeh  in  June  1999.  He  was  eommissioned  through  USAFA  with  a 
Reserve  Commission. 

His  first  assignment  was  at  Eglin  AFB  as  an  operations  researeh  analyst  for  the 
28th  Test  Squadron,  53rd  Test  and  Evaluation  Group,  53rd  Wing.  In  August  2003,  he 
entered  the  Graduate  Sehool  of  Engineering  and  Management,  Air  Eoree  Institute  of 
Teehnology.  Upon  graduation,  he  will  be  assigned  to  Air  Eoree  Materiel  Command, 
Wright-Patterson  APB,  Ohio. 


105 


REPORT  DOCUMENTATION  PAGE 

Form  Approved 

OMB  No.  074-0188 

The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  the  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188),  1215  Jefferson  Davis  Highway,  Suite 
1204,  Arlington,  VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  an  penalty  for  failing  to  comply  with  a  collection  of  information  if 
it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 

1.  REPORT  DATE  (DD-MM-YYYY) 
03-2004 

2.  REPORT  TYPE 

Master’s  Thesis 

3.  DATES  COVERED  (From  -  To) 

Jun  2003  -  Mar  2004 

4.  TITLE  AND  SUBTITLE 


5a.  CONTRACT  NUMBER 


AN  INVESTIGATION  OF  THE  EFFECTS  OF  CORRELATION, 
AUTOCORRELATION,  AND  SAMPLE  SIZE  IN  CLASSFIER  FUSION 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 


5d.  PROJECT  NUMBER 


Leap,  Nathan  J.,  Capt.,  USAF 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES(S)  AND  ADDRESS(S) 

8.  PERFORMING  ORGANIZATION 

Air  Force  Institute  of  Technology 

REPORT  NUMBER 

Graduate  School  of  Engineering  and  Management  (AFFT/EN) 

2950  Hobson  Way,  Building  641 

WPAFB  OH  45433-7765 

AFIT/GOR/ENS/04-06 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

AFOSR 

Attn:  Major  Juan  R.  Vasquez 

801  North  Randolph  Street,  Room  933  (703)  696-843 1 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

AFOSR 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

Arlington,  VA  22203-1977  e-mail:  Juan. Vasquez@afosr.af  mil 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

This  thesis  extends  the  research  found  in  Storm,  Bauer,  and  Oxley,  2003.  Data  correlation  effects  and  sample  size  effects  on  three 
classifier  fusion  techniques  and  one  data  fusion  technique  were  investigated.  Identification  System  Operating  Characteristic  Fusion 
(Haspert,  2000),  the  Receiver  Operating  Characteristic  “Within”  Fusion  method  (Oxley  and  Bauer,  2002),  and  a  Probabilistic  Neural 
Network  were  the  three  classifier  fusion  techniques;  a  Generalized  Regression  Neural  Network  was  the  data  fusion  technique. 
Correlation  was  injected  into  the  data  set  both  within  a  feature  set  (autocorrelation)  and  across  feature  sets  for  a  variety  of 
classification  problems,  and  sample  size  was  varied  throughout.  Total  Probability  of  Misclassification  (TPM)  was  calculated  for  some 
problems  to  show  the  effect  of  correlation  on  TPM.  Feature  selection  was  performed  in  some  experiments  to  show  the  effects  of 
selecting  only  certain  features.  Finally,  experiments  were  designed  and  analyzed  using  analysis  of  variance  to  identify  whaf  facfors  had 

the  most  significant  impact  on  fusion  algorithm  performance. 

15.  SUBJECT  TERMS 

Sensor  Fusion,  Classifier  Fusion,  Classification,  Probabilistic  Neural  Network,  ISOC  Fusion,  ROC  Curve  Fusion,  Correlation, 

Autocorrelation,  Sample  Size 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF  18.  NUMBER  19a.  NAME  OF  RESPONSIBLE  PERSON 

ABSTRACT  OF  Kenneth  W.  Bauer,  AFIT/ENS 

PAGES  19b.  TELEPHONE  NUMBER  C/nc/ude  area  code; 

(937)  255-6565,  ext  4328;  e-mail:  Kemieth.Bauer@afit.edu 


c.  THIS  PAGE 


